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ABSTRACT 


-v' 

Team  pursuit-evasion  games  are  studied  here  with  one  performance  Index 
for  the  team  as  a  unit  In  competition  with  one  common  opponent.  Particular 
structures  of  team  games  are  discussed  after  a  brief  Introduclton  of  the  two- 
player  differential  games.  The  classical  calculus  of  variations  Is  used  to 
derive  the  feedback  strategies  for  team  linear,  quadratic  pursuit-evasion 
games.  Several  definitions  of  the  performance  index  that  correspond  to  dif¬ 
ferent  levels  of  cooperation  and  hierarchical  organization  In  the  team  are 
Investigated.  The  game  of  kind  analysis  partitions  the  players  and  the  space 
according  to  their  role  in  the  team.  Practical  solutions  to  these  complex 
problems  rely  best  on  suboptlmal  schemes.  Thus  a  structural  analysis  Is 
presented  with  the  intent  to  simplify  the  computation  of  optimal  decision  and 
communication  processes.  Then  approximated  solutions  as  well  as  suboptlmal 
hierarchies  for  linear  quadratic  team  games  are  derived.  Two-player  games 
provide  a  great  deal  of  Information  concerning  the  solution  team  games,  allow¬ 
ing  to  compute  an  approximate  solution  of  a  three-player  game  using  a  composi¬ 
tion  method  and  to  derive  exactly  the  solution  of  a  complex  linear  quadratic 
team  game  from  a  controllability  study  by  providing  terminal-time  criteria  of 
selection  of  unknowns,  hierarchical  structures  naturally  arise;  In  particu¬ 
lar,  different  filtering  structures  for  a  stochastic  team  game  are  compared. 
Detection  and  localization  of  the  opponent  players  requires  processing  from 
several  sources.  In  the  underwater  case,  direction  finding  techniques  may 
fail  because  of  the  environment  (multipath  propagation)  or,  in  competitive 
situations,  because  of  jamming  signals.  The  nonlinear  processing  method 
developed  to  alleviate  these  difficulties  also  increases  the  class  of  problems 
solved  by  a  given  aperture,  and  is  based  on  the  eigenstructure  method  applied 


to  Mth-order  multiplicative  signals. 
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TEAM  DIFFERENTIAL  GAMES  AND  NON-LINEAR  SIGNAL  PROCESSING 

I .  INTRODUCTION 

Many  control  problems  involve  the  modelling  of  some  unknown, 
possibly  deterministic,  processes,  as  the  signal  processing  of 
measurements  from  a  target,  performed  by  one  or  more  channels.  In 
the  absence  of  knowledge  of  the  target  motion,  most  models  assime 
either  a  stochastic  type  of  dynamics  or  a  given  simple  motion,  such 
as  a  constant  course.  Then  various  techniques  are  used  to  recover 
from  unexpected  target  maneuvers,  including  input  estimation,  variable 
dimension  filtering  approaches,  white  noise  models  with  adjustable 
level  or  with  several  levels  (Reference  [1]  provides  further  biblio¬ 
graphy)  .  The  signal  processing  task  could  be  made  easier  if  some 
knowledge  of  the  behavior  of  targets  were  known.  But,  usually,  in  a 
hostile  environment,  the  target  expects  to  be  tracked  or  chased,  thus, 
a  competitive  situation  arises  between  the  various  observing  channels 
processing  the  data  and  the  unwilling  target,  requiring  the  study  of 
a  team  pursuit-evasion  game  problem. 

Formerly  introduced  by  the  pioneering  work  of  Von  Neuman  and 
Morgenstern  [2]  the  discrete  gaune  analysis  embodied  most  of  the  prin¬ 
ciples  of  the  game  theory  yet  recognized  as  important.  But  it  is  not 
until  the  concurrent  development  of  the  optimal  control  theory  and  the 
introduction  of  the  differential  games  by  Isaacs  [3]  that  the  game 


theory  caune  of  age. 


After  a  ten  yeeu:  initial  surge  triggered  by  the  publication  of 
Isaacs'  work,  the  study  of  the  games  was  mainly  captured  by  mathemati¬ 
cians  rather  than  engineers.  A  more  mathematical  approach  of  the 
games  emerges;  the  early  results  were  given  a  more  solid  theoretical 
support  and  were  extended,  as  in  Friedincin  [4]  .  In  particular,  bar¬ 
gaining  games,  in  which  N  players  individually  attempt  to  achieve 
some  optimal  performance  by  contracting  alliances,  were  advanced  to  a 
better  understanding.  Nevertheless,  the  last  few  years  experienced  a 
renewed  interest  from  the  engineering  community  for  differential  games 
a  typical  problem  being  the  stochastic  missile  versus  airplane  control 
problem  viewed  as  a  game  ( Jarmark  [5] ,  Shinar  [6] ) .  Together  with  the 
progresses  made  in  multi-target  tracking  (Bar-Shalom  [7]),  the  l-vs.-N 
game  where  a  single  missile  faces  several  possible  targets  was  also 
considered  (Breakwell  [8]). 

This  brief  history  of  the  games  is  herein  schematically  parti¬ 
tioned  into  four  different  "eras";  though  questionable,  this  classifi¬ 
cation  has  the  merit  of  enhancing  the  various  N-player  games  consi¬ 
dered.  First  came,  as  a  generalization,  the  games  in  which  the  N 
players  compete  individually,  then  a  few  particular  games  were  solved 
relying  on  geometrical  or  intuitive  remarks,  as  in  the  game  where  two 
cutters  attempt  to  prevent  the  escape  of  an  evader  (Isaacs  [3]).  Then 
came  the  bargaining  games  and  coalition  formation  problems,  both  with 
a  clear  economical  or  political  ulterior  motive,  and  last  came  the  one 
versus  many  games  where  a  single  pursuer  must  choose  between  several 
possible  evaders  as  a  first  target. 

Team  differential  pursuit-evasion  games  studied  here  involve 
coplayers  who  strive  against  a  common  opponent,  such  as  an  athletic 


team  with  a  common  goal.  The  individual  performance  is  superseded  by 
that  of  the  team  as  a  whole.  Clearly,  this  is  not  a  typical  N-player 
game  of  the  more  canmon  variety.  Nor  can  it  be  treated  as  a  second 
phase  of  a  bargaining  game  with  coalitions  already  formed,  such  as 
studied  by  Von  Neuman  and  Morgenstem  [2] ,  since  the  principle  of 
optimality  does  not  apply  to  the  individual  players  but  only  to  the 
team  as  a  unit. 

In  the  literature,  team  decision  theory  refers  to  the  games 
where  a  group  of  agents,  acquiring  different  information,  work  to¬ 
gether  in  a  coordinated  effort  to  achieve  a  common  goal,  as  in  Bagchi 
and  Basar  [9],  but  the  competitive  aspect  is  missing  in  these  games. 

Hereafter,  the  common  opponent  game  is  referred  to  as  the  team 
game  or  the  N-versus-one  game.  The  players  of  the  game  are  called 
"pursuers"  and  the  common  opponent,  "evader",  in  keeping  with  the 
spirit  of  "sink  the  Bismark". 

Solving  even  the  simplest  team  game  is  a  very  difficult  task, 
and  to  cope  with  the  curse  of  dimensionality,  analysis  of  the  simpler 
l-vs.-l  game  will  be  tciken  advantage  of.  Long  before  the  battle  of 
Saleimis  (480  BC)  ,  the  need  for  a  strict  organization  in  obtaining 
optimal  results  from  individual  elements  was  recognized.  Therefore, 
hierarchical  structures  are  introduced  and  sometimes  simplified,  as 
early  as  possible;  decentralized  and  suboptimal  structures  are  looked 
for  because  they  usually  yield  tractable  solutions  or  more  robust 
schemes. 

In  Chapter  II,  matrix  games  and  the  well  known  homicidal  chauffeur 
game  are  used  as  a  guideline  to  present  a  few  concepts  about 


differential  games  whose  understanding  is  required.  Chapter  III  dis¬ 
cusses  various  pitfalls  and  underlying  ass\mptions  in  the  formulation 
of  a  team  game;  the  calculus  of  variations  is  applied  to  yield  the 
necessary  conditions  of  optimality  for  team  games .  The  game  of  kind, 
partitioning  the  pxrrsuers  and  early  structural  game  solutions  is  the 
object  of  Chapter  IV.  The  next  chapter  focuses  on  the  linear  quad¬ 
ratic  team  differential  games,  expressing  the  solution  in  a  compact 
form  and  deriving  C^  (command,  control,  communication)  siaboptimal 
structures.  A  composition  approximation  to  reduce  the  computations 
required  for  linear  quadratic  team  games  is  presented  in  Chapter  VI. 
Chapter  VII  studies  a  fixed  terminal  time  quadratic  game  from  a  con¬ 
trollability  point  of  view  to  provide  criteria  to  select  the  terminal 
time  unknowns,  and  shows  how  it  yields  the  solution  to  a  complex 
problem  of  optimal  location  of  a  pursuer  in  a  team.  Stochastic  team 
games  are  envisioned  in  Chapter  VIII,  for  which  hierarchical  options 
prove  fundcimental .  The  next  chapter  features  a  non-linear  signal 
processing  technique  to  detect  and  locate  the  various  players  in  an 
unfriendly  environment,  where  multipath  propagation  and  jamming 
either  force  the  classical,  linear  methods  to  fail  or,  at  best,  to 
considerably  reduce  the  effective  array  aperture. 


II.  TWO-PLAYER  DIFFERENTIAL  GAMES 


1.  INTRODUCTION 

Let  S  be  a  system  including  two  variables  u  and  v  controlled  by 
two  distinct  parties  PI  and  P2  who  strive  to  maximize  two  corres¬ 
ponding  performance  indices  J1  and  J2.  Then,  a  gcune  siutation  arises 
whenever  the  control  policies  of  either  player  at  the  present  time  is 
not  known  by  both  peurties,  hereafter  named  "players". 

When  each  player  has  all  infoirmation  edxjut  the  system  to  control 
the  form  of  the  performance  indices  and  the  other  player's  choice  of 
strategies,  the  game  is  a  perfect  information  game. 

Particular  games  for  which  only  one  player  is  aware  of  the 
other  player's  strategy  are  called  hierarchical  or  Stackelberg  games. 
One  player,  the  leader,  announces  his  strategy  first  and  the  other 
player,  the  follower,  reacts  accordingly.  When  both  maximizing 
players  have  identical  performcince  indices  and  goals,  a  Pareto  game 
is  defined.  Otherwise,  whenever  J1  and  J2  differ,  conflicting  inter¬ 
ests  create  a  competitive  situation.  The  game  is  a  zero-sum  game  if 
J1  +  J2  =  0,  that  is,  when  the  interests  of  the  players  are  opposite; 
otherwise,  a  non-zero-sum  game  is  defined.  Competitive  games  are 
usually  defined  in  terms  of  a  miximizing  and  a  minimizing  player. 

When  the  number  of  strategies  available  is  coxintable,  the  game 
is  said  to  be  discrete.  If  there  is  a  finite  number  of  possible 
strategies,  a  payoff  cem  immediately  be  associated  with  each  playable 
control  pair  (u,v)  eind  the  game  ccui  be  put  under  a  convenient  matrix 
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form.  On  the  other  hand,  cm  uncountable  number  of  possible  strategies 
characterizes  continuous  games.  A  differential  game  is  a  continuous 
game  for  which  the  system  on  which  the  controls  apply  is  defined  in 
terms  of  a  set  of  lumped  differential  equations.  Classical  examples 
cure  Lanchester ' s  equations  and  the  predator-prey  equations. 

The  study  of  gambling  gave  game  theory  its  name,  but  the  most 
obvious  application  of  gcune  theory  is  to  clear  competitive  situations 
such  as  pursuit-evasion  games,  combat  models  and  macro-economic  be¬ 
havior  and  strategies.  Yet,  a  very  important  domain  of  application 
of  game  theory  is  in  optimization  problems  with  unpredictable  para¬ 
meters  or  forcing  functions.  The  classical  method  solves  that  prob¬ 
lem  as  a  stochastic  control  problem,  modelling  the  unknown  as  a 
random  parcuneter,  provided  that  the  statistics  of  the  unknown  be 
a  priori  specified.  Otherwise,  a  worst  case  study  might  be  neces¬ 
sary.  This  conservative  approach  assumes  that  the  unknown  parameter 
or  forcing  function  is  controlled  by  an  intelligent  adversary,  in  a 
zero-sum  game  formulation.  The  method  can  be  used  to  replace  a 
stochastic  problem  by  a  deterministic  one.  Problems  such  as  ship 
collision  avoidance  or  games  against  nature  are  treated  this  way. 

A  few  of  the  important  concepts  concerning  two-player  game 
theory  are  introduced  below,  rather  informally,  in  order  to  provide 
a  basis  to  the  study  of  team  games.  A  more  complete  study  of  dis¬ 
crete  games  can  be  found  in  [2];  [3]  introduces  differential  games, 
and  pursuit-evasion  games  are  focused  upon  in  [10] . 

2.  DISCRETE  GAMES 

In  the  following  perfect  information,  zero-sum  competitive 
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discrete  game,  the  maximizing  player  PI  has  two  availadjle  strategies: 

emd  U2  that  select  a  row  in  the  matrix,  when  P2,  the  minimizing 
player,  can  choose  either  column  1  or  2. 

Tcdsle  1.  Matrix  game  with  a  pure  strategy. 


Player  P2 


u., 


Player  PI 


V  V 
1  2 


The  payoff  corresponding  to  the  play  (u^,  vj  is  the  matrix  co¬ 
efficient  j .  Since  both  players  must  annoimce  their  strategy  at 
the  same  time,  it  seems  natural  for  PI  to  choose  the  row  with  the 
smallest  maximum  cmd  for  P2  to  choose  the  column  with  the  laurgest 
minimum.  Then: 


and 


max  min  L.  . 

1: 

V.  u. 

1  1 

min  mcuc  L.  . 

1: 

u.  V. 

1  : 


3  ,  PI  plays  first 
3  ,  P2  plays  first. 


When  both  expected  payoffs  match,  the  result  is  called  "value  of  the 

*  * 

game".  The  strategy  (u  =Uj^,v  =^2^  equilibrium  solution,  named 

"minimax  stragegy" . 

The  main  problem  with  the  minimax  strategy  is  that  it  requires 
an  exhaustive  search  over  all  the  possible  strategies.  Clearly, 


this  is  unsuitable  to  large  dimensioned  matrices  or  to  continuous 


geunes.  A  quicker  converging  strategy  is  the  Nash  equilibrium 
strategy.  It  is  defined  as: 


*  * 

u  =  2u:g(  mcuc  L(u^,v  ))  , 

u. 

1 

*  * 

V  =  arg(  min  L(u  ,v.))  . 

V.  ^ 

D 

The  two  edxjve  equations  must  hold  simultcuieously,  and  the  search  is 
made  for  a  fixed  assumed  optimal  strategy  for  the  other  player.  It 
ceui  immediately  be  understood  that  a  Nash  strategy  corresponds  to  a 
local  equilibrium  concept,  when  the  minimax  strategy  corresponds  to 
a  global  equilibrium.  Consequently,  uniqueness  is  not  ensured,  as 
the  following  example  shows,  in  which  both  minimizing  players  PI  and 
P2  face  two  possible  Nash  equilibria,  namely  (u^^,  v^)  and  (U2,V2), 
and  a  minimax  . 

Tedsle  2.  Matrix  game  with  a  mixed  strategy. 


Player  P2 


Player  PI 


V  V 
1  2 


Moreover,  games  for  which  no  unique  choice,  named  pure  strategy. 


can  satisfy  min  max  »  max  min  have  a  Nash  strategy  in 


pure  strategies.  Then,  as  the  game  repeats  itself,  the  controls  must 
be  chosen  in  a  prob^d3ilistic  way,  defining  a  "mixed  strategy" .  The 
corresponding  averaged  payoff  then  satisfies  min  max  L. .  =  max  min  L. 


A  well  known  theorem  states  that  the  value  of  a  game  always  exists  in 
pure  or  mixed  strategies  for  matrix  games. 


Though  simpler  in  their  definition,  discrete  games  show  many  of 
the  important  characteristics  of  continuous  games,  such  as  the  infor¬ 
mation  structure,  the  various  equilibrium  strategies,  the  definition 
of  a  unique  quantity  called  the  value  of  the  game,  and  the  possible 
occurrence  of  a  mixed  strategy. 

3 .  A  PURSUIT-EVASION  DIFFERENTIAL  GAME; 

THE  HOMICIDAL  CHAUFFEUR  GAME 

3.1  Presentation  of  the  game 


A  general  form  for  the  performcmce  index  of  a  zero-sxam  game  is 


J(u,v)  =  g(x(t,),t  )  +  /  h(x,u,v, t)dt  . 
f  f  to 


(1) 


If  g(x(t-),t,)  =  0  and  max  min  h(x,u,v,t)  >  0,  the  game  is  said  to 
It  V  u  “ 

be  a  generalized  pursuit-evasion  game. 

It  is  assumed  that  the  differential  equations  describing  the 
motion  or  policy  of  the  various  players  as  well  as  the  initial  posi¬ 
tions,  are  given.  Then,  as  the  game  proceeds,  the  evader  attempts 
to  avoid  the  various  capture  zones  related  to  the  pursuers.  The 
classical  method  divides  the  game  initially  at  t  =  t^,  into  two 
distinct  phases,  named  the  geime  of  kind  and  the  game  of  degree. 

The  game  of  kind  can  be  summarized  as  finding  the  answer  to  the 
question:  is  capture  possible?  Once  capture  is  assumed  to  be  possi¬ 
ble,  the  game  of  degree  attempts  to  find  the  optimal  way  to  conclude 


i.e.  classical  optimal  control  theory,  heuristic  approaches  relying 
on  simple  geometrical  considerations  and  approximation  methods  tail¬ 
ored  to  specific  games. 

To  illustrate  further  definitions  emd  remarks,  a  specific  example 
is  treated.  The  homicidal  chauffeur  game,  also  known  as  "the  two- 
car  problem",  is  one  of  the  best  examples  of  pursuit-evasion  games. 
Both  its  simplicity  and  its  versatility  are  remarkcdile .  The  deriva¬ 
tion  of  the  classical  one-pursuer-one-evader  solution  borrows  heavily 
from  Isaacs  [3] . 

A  pursuer  P,  of  speed  p  and  control  u,  attempts  to  capture,  in 
minimum  time,  an  evader  E  of  speed  e  and  control  v.  The  heading 
cuigle  rates  are  subject  to  the  controls.  The  dynamics  obey  the  fol¬ 
lowing  Ivimped  differential  equations 

=  Psin(0p)  , 

=  pcos(0  )  ,  (2) 

2p  p 

S  =  u  , 

P 

for  the  pursuer,  and  the  dynamics  of  the  evader  are 

X,  =  esin(0  )  , 
le  e 

=  ecos(0^)  ,  (3) 

0  =  V  . 

e 

In  an  unconstrained  space,  only  the  relative  position  of  the  two 
players  matter.  The  classical  approach  reduces  the  game  by  fixing  P 
at  the  origin,  defining  the  new  coordinates,  according  to  Figure  1, 


hereby  reducing  the  state  equations  to 


(6) 


=  -ux^  +  esin(v)  , 

=  ux^  +  ecos (v) -p  . 

In  general,  p  >  e  is  enforced,  thus  the  pursuit-evasion  game  represents 
a  study  of  the  speed  versus  maneuverability  type. 

Capture  is  achieved  whenever  E  is  forced  within  the  terminal 
manifold,  or  lethal  area,  of  P,  described  by  a  matrix  M  and  a  scaleu: 
r  as 

x'^Mx  ^  r^  .  (7) 

In  the  sequel,  M  =  I,  and  the  lethal  area  is  a  circle  of  radius  r; 
the  terminal  manifold  is  described  by 

2.2  2 

x^  +  x^  -  r  <  0  .  (8) 


3.2  Solution  of  the  game  by  the  calculus  of  variations 


The  performance  index 


-tf 

J(u,v)  =  /  ^  dt  =  t  -  t  , 
to  r  o 


(9) 


is  associated  with  the  state  (6)  in  the  Hamiltonian 


H(u,v)  =  1  +  (-ux^+esinCv) )  +  (loXj^+ecos  (v) -p)  ,  (10) 


by  adjoining  the  costate  vector  A  =  (A  ,A  ).  The  costate  variables 

X  X 

propagate  according  to 


-a 


and  the  transversality  cxjndition  provides  a  terminal  time  relation¬ 
ship  as 


=  V 


2  2  2 
_d(x^  +  -  r) 


3x. 


^2<V 


=  V 


2  2  2 
3(x^  +  X  2  -  ^  ^ 


vx^(t^)  , 

VX2(tf)  , 


(12) 


expressing  the  orthogonality  of  the  costate  vector  with  the  terminal 
manifold  at  the  terminal  time;  V  is  a  Lagrange  multiplier  whose  value 
must  be  such  that 


H(t^)  =  0  (13) 

holds . 

The  optimal  controls  corresponding  to  a  Nash  equilibrium  strategy 
or  defining  a  game  theoretic  saddle-point,  are  computed  by  applying 
the  minimum  (and  maximum)  principle  as 

*  * 
u  =  arg(min  H(u,v  ))  , 

lu|<U 

*  *  (14) 

V  =  arg(meuc  H(u  ,v))  . 

V 


Generally,  a  game  theoretic  saddle-point  does  not  exist  unless  the 
Hamiltonian  is  separable,  that  is,  unless  Isaacs'  condition 


H(u,v)  =  H  (u)  +  H  (v)  ,  (15) 

u  V 

is  satisfied. 

The  )uiowledge  of  the  initial  state  together  with  (12)  allows 
classification  of  the  one -pur suer-one -evader  homicidal  chauffeur  game 


as  a  classical  two-point  boundary- value  problem. 


Developing  the  above  equations,  introducing  a  reverse  time 


T  =  t^  -  t  and  a  terminal  hit  angle  a,  the  optimal  policies  can  be 
derived  as 


u  =  Usign(x2sin  (UT+a)  -  x^^cos  (UT+a) )  , 

*  (16) 
V  =  UT+a  , 


where  sign  is  the  signum  function. 

The  retrograde  path  equations,  integrated  from  a  terminal  hit 
point  in  the  reduced  coordinate  system  are 


Xj^(T)  =  p-pcos(UT)  +  e  (r-T)  sin  (UT+a)  , 
^2^^^  “  psin(UT)  +  e (r-T) cos (UT+a)  . 


(17) 


Equations  (17)  are  valid  up  to  the  first  switch  in  u,  then,  according 
to  the  new  value  of  u,  a  new  set  of  differential  equations  are  to  be 
integrated  until  the  next  switch.  A  trial  and  error  procedure  must 
be  used  to  find  the  proper  hit  cuigle  that  corresponds  to  the  initial 

condition  (x, (t  ),x_(t  )). 

1  O  2  o 

Due  to  the  particular  example,  the  exact  value  of  the  Lagrange 
multiplier  v  is  irrelevant.  In  most  games,  finding  a  suitable  value 
of  V  that  satisfies  (13)  is  an  acute  problem  since  the  multiplication 
of  the  costate  vector  by  the  controls  yields,  even  in  the  simplest 
case,  a  second-order  equation  in  v,  often  including  non-linear  ele¬ 
ments  such  as  absolute  values,  which  may  produce  zero,  one  or  two 
possible  solutions.  Though  a  global  study,  even  for  the  sinplest 
linear  game  is  nearly  impossible,  only  one  value  of  v  at  a  time  seems 
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to  be  the  rule  since  possible  values  of  v  cire  checked  against  two 
important  conditions:  the  playability  and  capturability  conditions. 


The  playability  condition  states  that,  if  pursuer  P  captures 
E  at  t  =  t^,  and  if  g{x(t^))<0  describes  the  terminal  manifold,  then 
the  scalar  product 

-►  -> 

grad(g{x(t^) ) ) •x(t^)<0  (18) 

is  negative.  The  playeibility  condition  expresses  the  simple  fact  that, 
in  order  for  capture  to  be  performed,  the  evader  must  penetrate  into 
the  terminal  manifold.  When  the  terminal  meuiifold  is  a  circle,  x(t^) 
is  a  normal  vector  to  the  terminal  manifold,  parallel  to  the  costate 
vector  since  the  transversality  condition  holds.  Then,  an  equivalent 
form  is 

X(t^) •x(t£)<0  .  (19) 

The  captuTcdaility  condition  is  defined  as 

H(t^)  -  k<0  ,  (20) 

with  k,  a  positive  constant.  Compared  with  (13) ,  the  capturability 
condition  appears  trivial.  For  minimum-time  problems,  it  is  con¬ 
venient  to  choose  k  =  1.  Then,  (20)  and  (19)  are  equivalent  con¬ 
ditions  for  one-pursuer-one-evader  games. 

Nevertheless,  the  capturability  is  a  global  condition  on  the 
game,  when  the  playability  condition  must  be  respected  by  every  single 
pursuer  actively  peirticipating  in  the  capture,  thus,  for  team  games, 
these  two  conditions  are  clearly  different.  The  distinction  between 
capturability  and  playability  has  been  overlooked  so  fcir  in  the  study 
of  the  one-versus-one  games. 


For  the  two-car  problem,  the  condition  expressed  by  (19)  gives 
e  ^  p,  the  obvious  requirement  that  the  pursuer  be  faster  than  the 
evader . 

When  the  capture  area  is  defined  by  a  time-invariant  eqioation, 
it  is  convenient  to  apply  the  playability  condition  to  the  terminal 
manifold.  In  some  instances,  it  defines  the  portion  of  the  terminal 
manifold  that  Cein  be  used  in  order  to  capture  the  evader,  named  "the 
usable  part  of  the  terminal  manifold".  The  usable  part  is  not  a 
characteristic  of  the  game  studied  but  depends  rather  on  the  way  the 
game  is  studied.  For  the  homicidal  chauffeur  game,  the  usable  part 
can  easily  be  computed  as  the  part  of  the  circle  for  which 

X2(t^)  ^  re/p  .  (21) 

The  optimal  trajectories  conducted  from  the  two  points 
X2(t^)  =  re/p  are  referred  to  as  "semi -permeable  lines".  These  lines 
separate  capture  from  escape  amd  aire  best  characterized  by  the  fact 
that  the  payoff  is  discontinuous  across  them.  Permeaibility  ensures 
that  only  under  a  non-optimal  play  can  the  trajectory  cross  that 
border  line.  If  the  semi-permeable  lines  intersect,  the  capture  zone 
is  finite.  Solving  x^^  =  0  with  (21)  and  (17)  gives  the  condition 
r  ^  r^  under  which  the  semi -permeable  lines  intersect. 

Figures  2  and  3  show  two  possible  sets  of  trajectories.  In  the 
more  interesting  case  of  Figure  2,  singular  behaviors  are  numerous. 
The  semi -permeable  lines  stop  at  point  C  and  c' ,  consequently,  if  the 
evader  is  located  around  point  E,  a  swerve  motion  is  adopted  by  P, 
turning  left,  away  from  the  evader  at  first,  in  order  to  go  around  C, 


xBEB  Usable  part  of  the  terminal  manifold, 

inrmi  Seitii-permeable  line . 

—  Trajectory. 


Figure  3.  Homicidal  chauffeur  game  with  restricted  capture  zone 


and  then  right  to  face  the  evader  and  achieve  capture  in  a  linear 


course.  At  points  B  and  b',  an  infinite  number  of  optimal  strategies 
are  available.  Along  the  lower  half  of  axis  x^/  the  optimal  value 

for  u  is  U  or  -U,  whereas  the  vj^per  half  corresponds  to  a  more  clas- 

★ 

sical  singular  solution  u  =0.  Thus,  the  study  of  the  singularities 
nearly  represents  the  whole  effort  in  solving  the  two-car  problem. 

3.3  Further  studies  of  the  homicidal  chauffeur  game 


Unlike  the  choice  made  so  far,  an  obvious  reduction  choice  to 
study  the  N-versus-one  homicidal  chauffevur  game  is  to  position  the 
evader  at  the  origin.  If  all  the  capture  sets  are  circles  of  iden¬ 
tical  radius,  an  equivalent  "safety  set"  surrounding  the  evader  can 
be  defined.  Then,  the  state  equations  are 


=  x^v  +  psin(u)  , 

(22) 

x^  =  -x^v  +  pcos (u)  -  e  . 


When  p  >  e,  the  usable  part  of  the  terminal  manifold  is  not  restricted 
and  consequently,  the  part  of  the  study  involving  singular  behavior 
such  as  the  semi -permeable  lines  cainnot  be  conducted.  Trajectories, 
from  a  hit  angle  a,  are  integrated  as 


x^  =  -e+ecos (VT) +p (r+T) sin (-VT+a)  , 

x^  =  -esin{VT) +p (r+T) cos (-VT+a)  , 


(23) 


and  are  plotted  in  Figure  4. 


Figure  4.  Reversed  homicidal  chauffeur  game. 


Trigonometric  functions  as  in  (6)  are  quite  cumbersome  to  deal 
with,  a  linear  or  bilinear  set  of  differential  equations  to  approxi¬ 
mate  the  exact  game  would  sin^lify  the  study  of  the  team  game.  The 
homicidal  chauffexir  game  is  equivalent  to  an  extended-state  bilinear 
model  if  the  control  v  is  constrained  as 

|vl  <  V  =  0.3  .  (24: 

Then,  the  state  eqioations  for  that  game  are 
X  =  Ax  +  Bxu  +  Cv  , 


A  = 

'o  0  0  ■ 

0  0  e-p 

,  B  = 

0  1  o' 
10  0 

,  C  = 

e' 

0 

0  0  0 

0  0  0 

L  ^ 

0 

A  deviation  of  the  solution  can  be  conducted  in  a  similar  fashion  as 
previously.  The  trajectories,  shown  in  Figure  5,  are  sections  of 
circles  whose  centers  and  radii  chemge  at  every  switchings  of  the 


validity  band 


amiiui  Usable  part. 

*****  Constant  terminal  time. 
— * —  Optimal  trajectory. 

. .  Semi -permeable  line. 

osK  Switching  line  of  u. 
-  Switching  line  of  v. 


Figure  5.  Bilinear  approximation  of  the  homicidal  chauffeur  game. 


The  non-reduced  trajectories  would  show  that  the  evader  adopts 
a  nearly  parallel  course  to  the  pursuer,  because  of  the  constraint 
on  V.  Nevertheless,  it  is  remarkable  that  the  same  type  of  trajec¬ 
tories  euid  policies  are  found  for  the  bilinear  approximation  of  the 
homicidal  chauffeur  game  even  outside  the  area  defined  by  |v|  £  0.3, 
in  which  the  approximation  is  valid.  On  the  other  hand,  the  validity 
beuid  only  includes  the  terminal  end  game  maneuvers,  thereby  excluding 
the  interesting  phases  of  the  pursuit. 
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Actually,  the  vectogram  corresponding  to  the  original  homicidal 
chauffeur  game  is  given  by  the  vector  (sin(v) ,cos(v) ) ,  when  for  the 
bilinear  approximation,  it  is  given  by  the  vector  (l,v)  for  |v|  <_  0.3 
As  Figure  6  shows,  it  is  a  rather  poor  approximation.  A  better 
approximation  is  the  square  vectogram  approximation.  To  give  a  more 
convenient  set  of  state  equations,  the  circle  is  approximated  as 
closely  as  possible  with  the  simplest  form  possible.  The  new 
equations  are 


il  =  -ux^  +  av^  , 

X2  =  uxj^  +  av^  -  p  , 


(26) 


where  the  controls  are  constrained  as 


lu|  <  U  , 

Kil  +  1^2!  -  ®  • 


(27) 


Actually,  only  the  four  vectors  corresponding  to  the  comers  of  the 
square  (^^*^2  ~  used,  the  evader  switching  from  one  to  the 

other,  as  Figure  7  shows.  The  results  show  the  same  general  behavior 
as  the  exact  solution.  The  discretization  performed  by  the  approxi¬ 
mation  on  the  control  v,  causes  some  troubles  in  the  stabilizations 
of  the  trajectories  about  the  semi -permeable  line.  In  particular, 
the  use  of  the  Euler  formula  to  integrate  the  differential  equations 
produces  cleaner  switches  them  the  fourth  order  Runge-Kutta  method, 
because  of  the  important  number  of  switches  required.  In  order  for 
this  vectogram  approximation  to  be  an  optimal  fit,  the  parameter  a 
can  be  optimized.  Dolezal  [11]  addresses  this  type  of  parameter 


A.  Original  game.  B.  Bilinear  approx.  C.  Square  vectogram  approx. 
Figure  6.  Vectogram  approximations. 


Usable  part. 

Optimal  trajectory. 
Semi -permeable  line. 
Switching  line  for  u. 
Switching  line  for  v. 


Figure  7.  Square  vectogram  approximation  of  the  homicidal  chauffeur 
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optimization  problem  prior  to  the  completion  of  a  game.  The  perfor- 
mcmce  index  to  be  minimized  may  become  complex.  By  changing  the  value 
of  a,  simulation  shows  that  it  seems  to  be  always  possible  to  approxi¬ 
mate  fairly  closely  paurts  of  the  exact  solution,  but  significamt 
deviations  always  occured  at  some  point. 

The  difficulties  to  approximate  the  equations  of  the  homicidal 
chauffeur  game  are  quite  typical  to  the  differential  games.  Lineari¬ 
zations  around  open-loop  trajectories  are  hazardous  since  the  other 
player  may  choose  a  tailored  (closed- loop)  strategy  such  that  the 
result  will  significeuitly  deviate.  Local  minimax  properties  Ccuinot 
be  claimed  safely,  due  to  this  generic  unstable  behavior  of  differ¬ 
ential  games.  Open-loop  strategies  are  not  interesting  from  a 
practical  point  of  view.  Closed-loop,  or  feedback  strategies  are 
more  useful,  not  only  because  they  allow  to  cope  with  uncertainties 
in  the  position,  parameters,  etc.,  but  in  order  to  recover  from  a 
deliberate  non-optimal  play  of  the  opponent  or  to  take  advantage  of 
an  error  committed  by  the  other  party.  Minimum  time,  one-versus-one 
pursuit-evasion  games,  that  have  a  convex  terminal  manifold  can 
easily  be  studied  globally,  the  trajectories  being,  in  essence,  closed 
loop.  This  is  not  the  case  of  gcunes  involving  a  fixed  terminal  time 
or  integral  constraints  on  the  controls  or  states.  In  this  latter 
case,  a  particular  trajectory  corresponds  to  each  point,  a  complete 
state-plane  representation  of  the  trajectories  is  impossible,  and, 
in  the  event  of  a  detected  non-optimal  play,  a  new  open-loop  solution 
must  be  recomputed. 

The  classical  method  uses  the  calculus  of  variations  to  solve 


the  game  of  degree,  thus,  the  game  of  kind  must  have  been  solved  first 


Hence  the  weakness  of  the  method  proposed  so  fcu:,  that  requires  a 
qualitative  study  beforehand.  A  global  study  of  the  problem  has  been 
suggested,  applying  the  Lyapunov  theory  operating  together  with 
dynamical  systems.  Unfortunately,  quoting  Skowronski  [12],  the  quali¬ 
tative  study  is  still  at  an  infamcy  stage. 

Classical  results  eire  extended  by  Simaan  euid  Cruz  [13]  to  the 
continuous  games  in  which  the  state  is  available  only  at  discrete 
instant  of  time,  when  Regade  and  Sama  [14]  show  that  the  optimal 
payoff  of  lineau:  differential  games  under  partial  observation  is  not 
altered  from  the  complete  observation  case,  an  obvious  property  of 
open-loop  policies. 

3.4  Two-pursuer-one-evader  homicidal  chauffeur  game 

Two-pursuer-one-evader  homicidal  chauffeur  games  ameanable  to  a 
solution  by  ways  of  simple  heuristic  geometrical  considerations  are 
studied  here,  in  an  attempt  to  demonstrate  what  cem  be  done  without 
any  thorough  or  even  partial  study  of  team  games. 

The  reversed  one-versus-one  homicidal  chauffeur  game  is  solved 
classically  as  eibove;  cases  where  the  two  pursuers  are  symetrically 
disposed  according  to  the  orientation  of  the  evader,  are  investigated. 

A  direct  pursuit  occurs  when  E  is  located  on  the  median  of  P1P2,  and 
properly  oriented,  according  to  Figure  8.  Then,  the  game  is  symmetri¬ 
cal  and  the  optimal  play  for  E  is  to  move  straight  along  x^.  This 
articular  two-versus-one  homicidal  chauffeur  game  is  equivalent  to 
the  one-versus-one  case  of  the  "wall  pursuit  game"  in  Isaacs  [3] , 
where  E,  constrained  to  x^,  is  chased  by  P^  alone.  Trivial  geometrical 
considerations  are  used  to  solve  that  game  according  to  Figure  9. 


iff 
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A  collision  dilemma  arises  vdienever  E  amd  the  symmetrical  pair 
face  each  other.  Then,  E  must  compare  its  payoff  for  two  strategies 
going  straight  ahead  or  turning  either  right  of  left,  as  shaurp  as 
possible.  If  E  decides  to  turn,  say  to  the  right,  he  will  favor  one 
of  the  pursuers,  namely  P^,  and  the  game  will  be  concluded  by  that 
very  pursuer.  Though  inactive,  is  responsible  for  having  forced 
E  to  turn  towards  P^  and  not  away  from  him.  That  optimal  trajectory 
is  given  by  the  one-versus-one  game  in  which  the  regressive  path 
trajectories  from  the  terminal  hit  points,  are  not  stopped  from 
symmetry  reasons,  along  the  axis  Two  examples  of  the  collision 

dilemma  to  be  solved  by  E  are  shown  in  Figure  10. 


P  alone . ;  t  =  2.67  "j 

Pj^&P^,  E  turns  .  .  .  .  :  t^  =  2.09  > 

P^&P^,  E  moves  straight:  t^  =  1.74  J 

Pj^  alone . :  t^  =  1.34  T 

Pj^&P.,,  E  turns  .  .  .  .  :  t^  =  1.02  I 

Pj^sP“,  E  moves  straight:  t^  =  1.07  J 


E  must  turn. 


E  must  move 
straight. 


Fig\ire  10.  The  collision  dilemma. 
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When  the  pursuers  are  not  symmetrically  disposed,  a  straight 
optimal  trajectory  of  E  is  possible  if  amd  are  located  on  the 
circle  of  center  (0,et^)  and  radius  (r+t^) .  Then,  E  oriented  away 
from  the  line  obvious  condition  to  ensure  the  straight  line 

play.  For  P^  fixed,  three  possible  situations  can  arise,  depending 
on  the  location  of  P^  along  the  circle,  two  of  which  are  solved, 
i.e.  the  straight-line  play  cuid  the  sharp- turn  play,  when  the  third 
situation  corresponds  to  a  "moderate"  turn  of  E.  Figure  11  depicts 
these  three  possibilities. 

Even  though  a  wide  variety  of  cases  can  be  solved  using  simple 
considerations  as  above,  a  method  to  solve  general  situations  for 


team  games  is  wanting. 


III.  INTRODUCTION  TO  TEAM  DIFFERENTIAL  GAMES 


1 .  INTRODUCTION 


Not  very  many  team  games  are  analyzed  in  the  literature;  to  name 
a  few,  Isaacs  [3]  derives  sufficient  conditions  in  order  for  a  team 
of  patrollers  to  block  a  channel,  Hagedorn  and  Breakwell  [15]  study 
the  game  in  which  a  fast  evader  attempts  to  pass  between  two  pur¬ 
suers,  dividing  the  game  into  two  distinct  phases,  one  in  which  the 
three  players  move  in  a  straight  course,  and  one  in  which  the  evader 
stays  at  a  constant  distance  from  its  closest  pursuer. 

The  same  partitioning  of  the  game  into  distinct  phases  allowed 
Foley  and  Schmitendorf  [16]  to  solve  a  differential  game  with  two 
pursuers  and  one  evader,  but  in  a  non-zero  sum  game  formulation,  with 
a  performance  index  for  each  pursuer.  On  the  other  hand,  linear  one- 
pursuer,  one-evader  games  are  studied  by  Pshenichnyi,  Chikrii  and 
Rappoport  [17] ,  with  a  unique  performance  index,  but  where  all  pur¬ 
suers  must  capture  the  evader  individually,  cuid  thus  the  evader 
strives  against  the  slowest  pursuer.  These  team  games  are  solved 
because  a  convenient  mapping  from  the  one-versus-one  game  is  possible, 
not  unlike  the  example  of  the  two-pursuer  one-evader  homicidal 
chauffeur  game  in  the  previous  chapter. 

Together  with  a  discussion  of  the  major  difficulties  in  the 
statement  of  tecun  differential  games,  this  chapter  formally  applies 
the  calculus  of  variations  to  derive  the  necesscury  conditions  of 
optimality  for  team  differential  games. 


2.  FACTORS  TO  CONSIDER  TO  STATE  A  TEAM  GAME 


In  most  practical  situations,  the  controls  are  physically  bounded, 
notcible  exceptions  being  systems  without  inertia  that  are  controlled 
directly  through  their  heading  aingle  as  in  the  homicidal  chauffeur 
game.  Unbounded  controls  are  more  suitable  to  derive  the  feedback 
strategies  and  to  avoid  the  switching  functions  that  come  together 
with  bang-bang  controls,  hence  the  popularity  of  quadratic  performance 
indices  that  provide  a  self-limitation  in  the  controls.  However, 
not  every  type  of  quadratic  game  is  suitable  to  a  deterministic  pursuit- 
evasion  game  study.  As  an  example,  the  following  bilinear  quadratic 
game  has  a  relative  state 

X  =  Ax  +  Bxu  +  Cv  ,  (28) 

a  quadratic  performance  index 

J  =  J  )dt  ,  (29) 

and  capture  is  achieved  whenever  the  evader  is  forced  within  the 
circle  of  radius  r.  A  complete  study  of  this  two  player  game  would 
show  that  the  capture  zone  is  finite  and  that  the  Nash  equilibrium 
strategies  within  that  zone  are  u  =  v  =  0.  Thus,  either  capture  is 
impossible,  or  it  is  bound  to  happen,  such  that  no  action  is  to  be 
taken  by  either  player.  It  is  quite  a  paradox  to  find  such  a  trivial 
solution  to  such  a  seemingly  complex  game.  The  difficulty  is  in  the 
merging  of  the  pursuit-evasion  concept  with  the  quadratic  performance 
index.  If  capture  only  matters,  the  value  of  the  control  used  by 
the  evader  becomes  infinite,  but  so  would  immediately  be  the  pursuer's 


control.  It  would  achieve  the  same  result  as  choosing  u  =  v  =  0  but 
for  a  possibly  different  value  of  J.  This  is  an  example  of  a  game  in 
which  the  perfect  information  structure  prevents  any  evolution  and 
hampers  the  interpretation  of  the  results. 

It  seems  that  a  better  approach  is  to  apply  integral  constraints 
on  the  controls.  The  first  effect  is  to  allow  only  open-loop  solu¬ 
tions  since,  in  most  cases,  the  players  try  to  spend  their  whole  con¬ 
trol  capabilities  eind  then,  captvure  zones,  singular  behaviors  etc, 
become  very  fuzzy.  If  the  bilinear  quadratic  game  above  is  used  as  a 
representation  of  the  homicidal  chauffeur  game,  according  to  the 
previous  chapter,  then,  the  controls  are,  in  fact,  angles,  and  the 
very  definition  of  cin  integral  constraint  of  an  angle  is  rather 
peculicur.  On  the  other  hand,  the  previous  chapter  shows  that  sub¬ 
stitution  of  the  original  controls  by  a  vectogam  approximation  is  an 
unsafe  manipulation  for  differential  games. 

An  easy  generalization  from  the  one-versus-one  games  can  be 
done  if  the  performance  indices  of  the  team  games  include  summations 
of  the  controls,  states,  etc.-  merely  adding  the  various  energies  etc, 
spent  by  the  team.  While  no  particular  problems  arise  for  minimum¬ 
time  team  games,  and  even  quadratic  team  games,  fixed  terminal-time 
games  that  include  a  terminal  miss  distance  are  more  difficult  to 
generalize  since  the  distance  between  the  evader  and  the  closest 
pursuer  is  to  be  considered.  Thus,  most  fixed  terminal- time  team 
games  include  a  cumbersome  minimum  operator  in  their  performance 


index  as 


where  the  reduced  state  equation  for  that  game  is 


X. 

1 


f  (x^,u^,v,  t) 


(31) 


for  pursuer  P^,  and  is  the  relative  distance  between  the  evader  E 
and  P^.  An  equivalent  form  for  a  two-versus-one  game  is 

J  =  (wg(x  (t) )+(l-w)g(x  (t  ) ) +L(u. ,v, t) )dt  (32) 

to  1  z  r  1 

by  introducing  the  switching  variable  w,  such  that 


w  =  1  for  g(x^(t))  <g(X2(t))  and 
w  =  0  for  g(x^  (t) )  ^9  (X2  (t) )  • 


(33) 


w  can  be  viewed  as  a  control  variable  belonging  to  a  mythical  mini¬ 
mizing  third  party,  such  that  (33)  is  a  Nash  strategy.  It  can  be 
shown  that  the  equivalent  game  requires  the  adjunction  of  a  state  y 
such  that 

y  =  w(g(x^(t))  -  g(x2(t)))  ,  (34) 


and  the  new  performance  index  is 

J  =  y(t,)  +  (g(x  (t))  +  L(x. ,u. ,v, t) )dt  .  (35) 

r  tQ  z  11 

Here  w  is  constrained  to  the  interval  [0,1].  Thus,  the  two-versus- 
one  tecun  game  with  the  non-linear  minimum  operator  in  the  performance 
index,  is  equivalent  to  a  three-versus-one  augmented  team  game  with 
the  simplified  performance  index  (35). 

Actually,  it  can  be  proved  that  N-versus-one  team  games  in  which 
a  term  such  as  min(a, ,a_, . . ,a  )  is  included  in  the  performance  index. 


where  a^,a.,..,a  are  m  independent  linear  functions  of  the  states, 

1  ^  m 

are  equivalent  to  simpler  forms  of  (N  +  m  -  1) -versus-one  team  dif¬ 
ferential  games.  If  a, ,a_,..,a  are  not  linearly  independent,  then 

12  m 

it  is  possible  to  find  a  set  (b^,b2..,b^)  of  independent  vectors  such 
that  for  1  <  m, 

min  (a,,a_,..,a  )  =  min  (b,,b...,b  )  .  (36) 

12  m  1  2  e 

On  the  other  hand,  particular  games  for  which  the  control  of  the 

evader  affects  independently  (m-1)  of  the  state  components  of  the 

functions  (a^,a2, • • ,a^) ,  proceed  according  to  m  distinct  phases: 

first  E  plays  according  to  the  "closest"  pursuer  in  the  sense  of  the 

minimum  function,  until  a  second  pursuer  is  just  as  close,  then,  E 

plays  to  keep  both  pursuers  equally  distant  until  a  third  pursuer 

becomes  equally  distamt  and  so  on.  At  the  terminal  time, 

a,=a_=...*a  .  The  games  studied  by  Hagedorn  and  Breakwell  [15]  and 
12  m 

Foley  and  Schmitendorf  [16]  belong  to  this  category.  If  the  above 
is  not  met,  then  chances  cure  that,  for  some  initial  conditions, 
a^  <  a^  holds  for  some  pair  i,j,  thereby  reducing  the  number  of 
arguments  of  the  minimum  operator. 

The  proof  of  this  theorem  is  rather  easy  and  will  not  be  pre¬ 
sented  here,  it  provides  a  tecmique  to  put  team  games  under  more 
classically  tractable  forms,  at  the  expense  of  an  increase  in  the 
dimensionality.  In  the  sequel,  team  games  are  assumed  to  have  been 
put  under  this  form,  thus,  cumbersome  minimum  operators  are  not  con¬ 
sidered  in  further  studies. 

Another  problem  is  the  definition  of  the  differential  equations 
describing  the  game.  In  a  pursuit-evasion  game,  only  the  relative 
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state  between  the  evader  cuid  the  pursuers  is  importemt  in  an  uncon- 
struned  space.  Therefore,  there  is  no  need  to  carry  the  equations 
of  motion  of  every  single  player,  when  a  so-called  reduced  state  is 
simpler.  The  reduction  is  long  recognized  as  one  of  the  major  steps 
in  studying  such  a  differential  game  as  a  pursuit-evasion  game,  as, 
for  example,  Pontryagin  [18]  shows  clearly. 

Nevertheless,  various  approaches  are  possible.  First  is  the 
brute  force  mEumer  whereby  the  equations  describing  the  motion  of 
each  player  are  kept,  at  the  expense  of  an  unwelcomed  increase  in  the 
dimensions  of  the  game.  Moreover,  keeping  track  of  the  positions  of 
the  players  euid  of  their  associated  terminal  manifolds  is  a  lot  more 
difficult,  as  the  guessing  of  the  terminal  state,  so  importeuit  in 
the  solution  of  the  N-point  boundary-value  problems,  using  optimal 
control  theory.  An  excellent  way  to  avoid  any  trouble  is  to  start 
from  a  reduced  set  of  equations.  Sometimes,  a  proper  choice  of 
coordinate  systems  can  simplify  the  problem  or  the  controls,  but  this 
depends  on  the  type  of  study  conducted. 

3.  NECESSARY  CONDITIONS  OF  OPTIMALITY  FOR 
GENERALIZED  TEAM  GAMES 

3.1  Presentation 

In  the  following,  the  notations  and  the  general  method  adopted 
are  borrowed  from  Bryson  and  Ho  [19] .  The  free  terminal  time  game 
obeys  the  differential  equations 

X  =  f{x,u,v,t)  ,  (37) 
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euid  x(t  )  =  X  is  given.  The  unconstrained  controls  are  u  for  the 
o  o 


pursuer  and  v  for  the  evader.  State  constraints 


4'(x(t^)  ,t^)  =  0  , 


(38) 


aure  adjoined  to  the  performance  index  by  the  set  of  Lagrange  multi¬ 


pliers  Vp  and  The  more  general  non-zero  sum  gaune  is  studied; 


then  the  performance  index  for  the  pursuer  takes  the  form 


J  =  (x(t,),t,)  +  V^ip(x(t  ),t-)  +  f  ^  (L  (x,u,v,t)  + 

ppff  p  ff  toP 


(39) 


A  (f (x,u,c, t) -x) )dt 
P 


and  for  the  evader 


Jg  =  -<t>g(x(t^)  ,t^)-  U^i|^(x(t^)  ,t^)  -  (L^(x,u,v,t)  + 


(40) 


A  (f(x,u,v,t)-x))dt  , 


where  4i  and  <{)  are  terminal  payoffs  and  L  ,  L  are  integral  fvinctions. 
e  p  ^  e  P 


The  Lagrange  multipliers  and  have  the  dimensions  of  the  vector  \p. 


From  now  on,  the  various  arguments  are  omitted  for  brevity. 

For  minimum-time  lineair  team  games  where  only  the  pursuers  that 
perform  capture  are  present,  the  results  derived  by  Pshenichnyi, 
Chilcrii  and  Rappoport  [17] ,  further  generalized  by  Satimov,  Azamov 
eUid  Khaidarov  [20],  can  be  referred  to,  stating  sufficient  conditions 
that  ensure  the  existence  of  a  solution  in  finite  time. 


3.2  Stationarity  for  the  one-versus-one  game 


Stationarity  of  dJ  and  dJ  is  expressed  taking  into  account 
e  p 


differential  changes  in  the  terminal  time. 


'-'A' 


N'a' 


m 


A 

V.-- 


.'►V 


e 


V"* 
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For  the  pursuer,  it  yields 


dJ  =  dt+v"^  l^t  I V  ^dx+v'^|^x)+  (L  ) .  dt--(L  ) .  dt  + 

p  dt^  pdt  dx  pdx  P  ^  ^  ° 


(X’’(f-x))  .  dt  -(A^(f-x))^  dt  +  6x+a'^|^x- 

p  t^  f  p  t^  o  to  dx  pdx 


a'^6x+  (•t-^u+a'^I^u)  +  (^— ^v+a'^'I^v)  )  dt 
p  du  pdu  dv  pdv 


At  time  t  =  t  and  t  =  t^  , 
o  f 


Ap  (f  (x,u, V,  t) -x)  =  0  , 


and,  introducing  the  Hamiltonian 


H  =  L  +  A  f  , 
P  P  P 


together  vri.th  the  function 


<5  =4)  +  V  \h  , 

P  P  P 


(41)  simplifies  into 


3$ 

f  )dt  +r-^dx].  -(L  )  dt  + 

p  dt  p  f  3x  t^  p  to  o 


^  3h  3h  3h 

/  ^  (3— ^x+»— ^u+T— ^v-A  6x)dt  . 
to  dx  du  dv  p 


From  Figvure  12, 


dx(t,)  =  6x(t,)  +  x(t-)dt- 


J&C  (If) 


Figure  12.  Terminal  time  differentials. 


Then,  integrating  by  parts,  the  following  equality  holds; 


/  ^  -a'^6x  =  -(X'^dx)  +  (X’^xdt)  -  +  (A^6x)  + 

to  P  P  tf  P  tf  p 


A  ox  , 
to  P 


together  with  (45) ,  it  yields 


9$ 

dJ  =  +L  +X^i)  dt  +((^  -  x'^)dx)^,+  (X^6x)^ 
p  dt  p  p  tf  f  9x  p  tf  p  to 


t-  3h  9h  9h 

(L  )  dt  +  /  (  ^  )  6x+r— ^U+r— ^v)  dt 

p  to  o  to  ox  p  ou  9v 


when  a  similar  derivation  yields 


9$  S'!*  T, 

-dJ  =  (-^L  -A^i)  dt  +((y-^X^  )dx)^,-(X^6x)^ 
e  dt  e  e  tf  f  dx  e  tf  e  to 


tf  T 

{L)_dt  -  (  (:^  +X  )  6x-h^u-hr-^  6v)  d 
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In  order  for  the  game  to  have  a  stationary  value,  the  coefficients  of 


dt^  and  dx  must  vanish  for  both  dJ^  euid  dJ^,  regardless  of  t.  There¬ 


fore,  these  terms  must  be  individually  set  to  zero,  that  is 


3H  _ 

— E.= 

9x  p 


9He  _  -T 
3x  ' 


9$ 


9x 


tf 


T 

<^’tf  ' 


9$ 


9x 


tf 


T 


(50) 


9$ 


9$ 


9$ 


+L  x)^.  =  +H  =  0  ,  +L  -X'x) 


dt  p  p  tf  dt  p  tf 


9t  e  e  tf 


9$ 


Vtf  =  ° 


or,  since 


9$  d$ 
_B.  =  —E. 
9t  dt 


9$ 


-  --E.  X 
9x  ^ 


(51) 


an  equivalent  condition  is 


9$ 


9$ 


0  ,  (—  =  0 


p  tf 


9t 


e'tf 


(52) 


Then,  if  the  above  holds,  dJ  and  dJ  are  left  to  be 

p  e 


tf 

dJ  =  /  (^“^  6uK--^  6v)dt  +  (A  dx).  -  (H  )  ^  dto  , 

p  tg  9u  9v  p  to  p  to 


(53) 


tf 

dJ  =  /  (^5— ^  6uty  ~  <5v)dt  +  (A  dx)  ^  +  (H  )  .  dto  . 

e  to  ou  9v  e  to  e  to 


A  stationary  value  for  both  J  and  J  requires  that 

P  e 


3h  dH 

6u  +  ;r— 2-  5v  =  0 
9u  dv 


and  (54) 


9h  9H 

6u  +  T— ^  6v  =  0  . 
9u  9v 


The  Nash  equilibrium  strategy  is  defined  by 


3h  * 

r —  =  0  with  V  =  V  given 

9u 


and  (55) 

3h  * 

^  =  0  with  u  =  u  given  . 

It  achieves  the  desired  stationaritYt  but  it  must  be  understood 

*  * 

that  the  controls  u  ,  v  that  satisfy  the  above  are  candidates  that 
must  also  verify  the  Nash  inequality; 

vueu  H(u,v)<H  (u,v  )  , 

P  “ 

and  (56) 

*  *  * 

V  V  £  V  H  (u  ,V)  <  H  (u  ,v  )  . 
e  — 

The  necessary  conditions  applied  to  the  one-versus-one  differential 

geime  define  a  twD-point  boundary-value  problem.  For  a  generalized 

pursuit-evasion  gcune,  (()  =  4)  =<1),L  =  L  =  L  hold  emd  the  equations 

P  6  p  6 

describing  the  terminal  manifold  are  usually  one-dimensioned;  then, 

V  and  V  are  scalars,  moreover  X  =-X  ,  v  =-v  =v  and  H  =-H  =H. 


Then,  together  with  the  Nash  xnequalrty,  the  necessary  conditions 
of  optimality  are 


(||  +  V  +L+x'^-)tf  =  (|i  +  V  II  +  =  0 


3h  ?T 


^3x  3x'tf  ^tf  ' 


M=  0  ^=0 

3u  °  '  3v  °  • 


3.3  Stationarity  for  the  N-versus-one  team  game 


For  the  teeun  game  opposing  the  pursuers  to  E  according  to  the 
state  differential  equations 


x^  =  f^  (x^,  u^,  V,  t) 


J  and  J  are  defined  as 
p  e 

J  =  E  (f)  (x(t_),t  )  +  I  +  /  ^(L  +EX^  (f  -X.  ))dt  , 

p  .  p.  f  f  .  p.  1  to  P  .  P-  X  1 

11  1  ^1  ^  1  *^1 


=  -E<J)  (x(t^),t^)  -Ev^  (-wEaJ  (f^-x^))dt  . 


and  the  Hamiltonians  are 


H  =  L  +  EX  f.  , 
p  p  i  Pi  - 


H  =  -L  +  EX  f. 


when  the  relation  between  the  one-vers\is-one  and  the  N-versus-one 


game  is 


L  =  X  (L 


P  •  P- 
1 


(61) 


L  =  E  (L 

®  i  ®1 


L  )  +  L 
ev.  ev  , 

X 


with,  in  the  one-versus-one  case  of  P.-vs.-E: 

r 


i  i 


(62) 


■  ♦..'■‘i'V'V*  '’I'*’., '"i  "=£>'*£>■" 

1  i  1  ^11 


-L  ,  (X. ,v,t)+X^  (f.-x.))dt. 
ev.  1  e.  1  1 

1  1 


A  step  by  step  derivation  identical  to  the  one  conducted  in  the 
two  player  case,  yields  the  necessary  conditions  of  optimality  for 
the  tecun  games,  as 


9  ,  3  $ 

( ^  +L  +EA'^  X.)  =  0  ,  (E^— — i-  +L  -Ex"^  X.)  =  0 

.9t.  p  .  p.  1  tf:  .9t^  e  .  1  tf 

if  *^i'^i  t  If  le.  ^ 


•Pi  ;T  ®i  ;T 


_  \  ^  ^  ^ 

9x.  p.  '  9x.  "  e.  ' 

11  1  1 


(63) 


^  =0  if  V  given  ,  I—  +  ^ 


=  0  if  u  given 


in  which  some  equations  are  expressed  N  times,  one  for  each  pursuer. 
For  this  team  game,  the  Nash  strategy  is  expressed  as 


It  it  It  it  it 

Jp{u^,U2, . . .  ,Uj^,v  )  £  J  {u^,U2, -  ,u^,v  ) 


V  (Ui,U2, • . /U^)  , 


J  (u. ,u-, . 
e  1  ^ 


*  *  *  * 


,u^,v)  1  . ..  ,Uj^,v  )  V  V  ev  , 


when,  in  the  general  N-player  case,  it  would  have  been 


*  it  it  it  it 


(  ^  i£l ,  •  • ,  N)  J  (Uj^  I  ••  r '  '^i+l ' '  * '^N^  “ 


*  *  *  *  ) ,  (V  u.  £  U. ) 

J(Uj^,  . .  ,u^_^,u^,u^^j^, .  .,u  N  1  1 


The  distinction  between  a  team  game  and  an  N-player  game  appears 
clearly. 

For  a  generalized  team  pursuit-evasion  game,  (})^^  =  =  (J)^, 

L  =  L  =  L  and  it  has  dimension  one .  Then  A  ,  =  -A  .  =  A ,  and 

p  e  pi  ei  1 

-V  .  =  V  .  =  V . . 
ei  pi  1 

The  necessary  conditions  of  optimality  are  summarized  below: 

9(j),  9i(^.  9h. 

-sr-\  ■ 

1  ^1 

d<t)  dih 

,  ^i  T  ^i,  ,.T,  9h  ^  3h  . 

(■K -  +V  ,  -IT - )  =  ( A  ,  )  ,  r -  =  0  ,  T —  =  0  . 

9x.  i9x.  t-  It,  9u.  9v 

1  If  f  1 


w  -  w'  .  ■  . 


'J-  V 


The  first  equation  in  (66)  ceui  only  determine  one  of  the  N  scalars 
v^.  The  remaining  scalars  must  be  chosen  so  that  the  solution  to  the 
equations  matches  the  initial  conditions  of  the  game.  The  (N-1) 
unknovms,  called  strategic  variables  are  defined  as 

V  z.  =  V. ,  z,  =  1  .  (67) 

Utilizing  this  formulation,  the  first  equation  in  (66)  is  used  to 
determine  the  unique  Lagrange  multiplier  v. 

The  strategic  variable  z^,  is  such  that,  when  z^  is  small, 
has  nearly  no  effect  on  the  gaime  as  seen  from  (43)  cuid  because 
has  a  module  proportional  to  z^.  On  the  other  hand,  if  z^  is  large, 
then,  is  more  important  than  other  players.  Thus,  there  is  a  clear 
relation  between  the  importauice  of  a  pursuer  and  the  value  of  its 
strategic  varicdsle.  Conversely,  the  value  of  z^  Ceui  be  used  a- 
posteriori  as  a  selecting  device  to  choose  the  relevant  players  to 
sittplify  a  game  and  to  compute  approximate  solutions. 

The  strategic  variable  is  not  constant  when  the  evader  is  sur¬ 
rounded  by  the  pursuers.  That  possibility  exists  whenever  there  are 
at  least  N  pursuers  for  ein  evader  moving  on  a  space  of  dimensions 
(N-1) .  Then  z^  adopts  a  value  such  that  the  control  of  the  evader 
disappears  from  the  Hamiltonian. 

The  main  problem  in  solving  the  team  games,  with  this  approach, 
is  to  guess-  properly  the  strategic  variables.  The  capturability  and 
playability  conditions  can  be  used  to  restrict  the  domain  of  defini¬ 
tion  of  z^.  When  a  given  level  of  cooperation  is  required  from  a 
pursuer,  the  value  of  its  associated  strategic  variable  is  necessarily 


restricted.  However,  finding  the  exact  value  of  the  strategic 
variaUsle  usually  requires  a  painstaking  trial  and  error  procedure 
Thus  a  fonaula  to  give  a  first  approximation  of  is  desired.  The 
correspondence  between  z^  euid  the  effect  of  the  pursuers  suggests 
formulae  such  as 


(Pj^E) 

^i  TFT) 


(P^E  -  r^)/P]^ 

\  "  (P^E  -  r^)/p.  ' 


where  P^E  is  the  distance  separating  the  evader  from  pursuer  P^, 
of  maximum  speed  p^. 

Also,  it  can  be  shown  easily  that,  under  favorable  initial  con¬ 
ditions,  two  slow  pursuers  can  catch  a  faster  evader.  Also,  if  the 
one-versus-one  game  (P^,E)  admits  a  solution  from  x^{t^),  then,  the 
two-versus-one  game  with  x^^Ct^)  is  possible  under  euiy  value  of  the 
strategic  variable.  Conversely,  for  any  pair  (x^ (t^) ,X2 (t^) ) ,  there 
exists  a  value  of  the  strategic  variable  such  that  the  necessary 
conditions  of  optimality  are  satisfied.  Then,  it  is  clear  that 
restricted  usable  parts  of  terminal  memifolds  can  no  longer  be  found 
eUid,  consequently,  the  convenient  use  of  the  singularities  to  study 
the  boundary  of  the  usable  part,  barriers,  capture  zones  and  par- 
^cular  trajectories  do  not  apply  to  team  pursuit-evasion  games. 

Feedback  solutions  can  only  be  given  relatively  to  the  other 


pursuer,  thus,  the  added  dimension  due  to  the  strategic  variable 


prevents  the  global  solution  to  be  represented  by  a  unique  state 


portrait,  since  trajectories  corresponding  to  different  values  of 
do  intersect. 

The  most  crucial  point  in  this  study  is  that  an  N-pursuer  team 
game  is  not  a  classical  (N+1)  point  boundary  value  problem  because 
another  (N-l)  unknowns,  modelled  as  strategic  variables,  are  to  be 
found.  The  team  of  pursuers  minimizes  a  common  performance  index; 
it  defines  a  single  Hamiltonian  for  the  team,  and  then,  whenever  a 
pursuer  is  added  to  the  team,  one  equation  is  missing.  Nevertheless, 
the  general  structure  of  the  equations  enables  one  to  take  advcintage 
of  the  individual  l-vs.-l  games  to  derive  structures  of  team  games. 


IV.  THE  GAME  OF  KIND 


1 .  INTRODUCTION 


By  analogy  with  the  l-vs.-l  game,  the  game  of  kind  study  for 
team  differential  pursuit-evasion  games  could  be  summarized  by  find¬ 
ing  the  answer  to  the  question  "can  the  pursuit  team  catch  the  evader 
in  finite  time?"  Sufficient  conditions  ensuring  the  existence  of  a 
control  to  achieve  capture  in  finite  time  are  stated  by  Satimov, 

Azamov  and  Khaidarov  [20] .  But  another  dimension  to  the  game  of  kind 
has  to  be  added,  cind  the  next  question  is  "which  are  the  relevant 
pursuers?"  In  other  words,  how  to  distinguish  the  pursuers  whose 
presence  is  required  from  those  whose  effect  on  the  optimal  solution 
is  null. 

A  pursuer  P^,  attempting  to  capture  an  evader  E,  is  expecting 
some  cooperation  from  other  pursuers.  Fixing  P^,  it  is  convenient 
to  partition  the  space  into  Vcirious  zones  in  which  a  copursuer  would 
be  expected  to  behave  in  a  particular  way.  As  the  game  proceeds,  the 
zones  evolve  dyneimically.  The  possible  zones  of  interest  are  numerous 
therefore  the  pursuers  will  be  assumed  to  play  optimally,  reducing 
the  number  of  the  zones  of  interest. 

After  the  various  definitions  of  the  zones  for  the  minimum  time 
problem,  a  parallel  with  the  game  of  degree  allows  cort5)utation  of  the 
most  useful  zones,  the  help  zones.  Approximated  formulae  are  also  of 
interest  since  the  computation  of  the  exact  zones  is,  at  best,  diffi¬ 


cult.  Finally,  an  example  is  treated  to  illustrate  the  procedure. 


2.  PURSUER  CLASSES 


Generally,  three  classes  of  ptorsuers  can  be  distinguished  as 
follows: 

i)  The  first  is  made  of  the  pursuers  that  are  irrelevant  to  the 
game  because  of  their  initial  position,  their  dynamical  character¬ 
istics  (speed,  maneuvercdaility,  etc.)  or  because  other  pursuers  are 
located  in  positions  that  completely  cover  the  possibilities  of  these 
pursuers . 

ii)  The  second  class  includes  the  pursuers  that  have  a  temporary 
effect  on  the  game.  The  typical  example  is  a  very  slow  pvursuer  lo¬ 
cated  on  the  course  of  the  evader.  This  pursuer  will  deny  a  direct 
course,  but  once  passed,  its  effect  will  be  null.  State  constraints 
like  islands  can  be  modeled  as  a  pursuer  of  speed  equal  to  zero,  with 
a  lethal  area  covering  the  island. 

iii)  The  third  class  consists  of  the  pursuers  that  actually  per¬ 
form  the  capture.  It  is  often  implicitly  assumed  that  the  pursuers 
belong  to  this  class. 

The  three  classes  must  be  studied  in  both  events;  an  optimal  as 
well  as  a  non-optimal  play  of  the  evader.  The  actual  control  that  a 
pursuer  should  adopt  in  the  (unlikely)  event  that  the  evader  would 
not  play  optimally,  when  such  an  optimal  play  of  the  evader  would 
not  allow  this  pursuer  to  be  useful,  is  difficult  to  find  since  the 
very  definition  of  the  criteria  to  optimize  is  problematic. 

The  study  of  class  two  partitions  the  game  into  several  sequen¬ 
ces  divided  by  time  t  from  which  the  pursuer  of  class  two  has  no 


effect  any  more.  Very  often,  the  evader  is  on  the  border  of  the 
lethal  area  of  this  pursuer  at  time  t^- 

The  global  study  of  the  game  of  kind  helps  in  finding  the  con¬ 
ditions  under  which  capture  is  possible.  These  conditions  on  the 
relative  speeds  and  maneuverability,  ensure  capture  independently  of 
the  initial  state  (capture  condition)  or  not  (playability  condition) . 
Under  particular  initial  conditions,  it  is  possible  for  two  slow 
pursuers  to  catch  a  faster  evader. 

3.  TWO-VERSUS-ONE  PURSUIT-EVASION  ZONES 

The  zones  represent  a  parametric  study  of  the  pursuit-evasion 
game.  Meiny  parameters  are  candidates  but  the  one  used  is  the  initial 
position  of  an  eventual  copvirsuer.  Thus,  the  other  parameters  such 
as  the  position  of  the  evader  at  t  =  t^  and  the  characteristics  of 
the  various  players  are  assumed  to  be  known.  Another  assumption  is 
the  optimality  of  the  strategies  of  the  pursuers. 

The  specific  pair  (P^,P^)  is  separately  studied.  To  a  pursuer 
P^,  six  zones,  corresponding  to  six  possible  cases,  are  relevant. 

The  following  notations  are  used: 

T  2 

x.M.x.  <_  r^  is  the  capture  condition  corresponding  to 
pursuer  P^. 

The  control  vector  of  pursuer  P^  is  u^,  defined  on  the  set  of 

admissible  control  functions  U^.  The  control  of  the  evader  is  v, 

defined  on  V.  Usual  definitions  of  U.  and  V  are  the  set  of  control 

1 

vectors  bounded  by  a  given  maximum  norm. 


t  (v)  is  the  capture  time  of  the  one-versus-one  game 


•i  (P^.E), 


where  E  applies  the  control  v,  and  P. ,  initially  located  at  x. (t  ), 

1  1  o 

it 

plays  optimally,  v  is  the  optimal  strategy  of  E.  To  each  pair 

It 

(v,  thus  corresponds  a  time  t^  (v) . 

*  *  *  *  ^ 
tp  (v  )  >  tp  (v  )  means  that  the  optimal  game  (P.,E)  finishes 

in  a  longer  time  than  the  optimal  game  (P^,E) .  Then  P^  is  a  more 

dangerous  player  than  P^  and  <  z^  is  likely  in  the  2-vs.-l 

*  *  *  * 

(P^,P.,E)  game.  Note  that  one  of  the  times  t^  (v  )  or  t  (v  )  may 

^  j  i 

be  infinite  if  the  corresponding  pursuer  is  unable  to  catch,  alone, 
the  evader. 

~  it 

ZIP.  (P./P.)  is  to  be  interpreted  as  zone  number  1  where  the 
30  1  3 

parameter  of  study  is  the  position  of  pursuer  P^  at  time  t  =  t^,  in 
which  Pj  is  located  such  that  P^  has  no  effect  on  the  game,  given 
that  Pj  plays  optimally. 

In  the  following,  x^Ct^)  is  given  and  the  parameter  describing 
the  zones  is  the  initial  position  x^Ct^)  of  an  eventual  copursuer. 

i)  zip^q(p^/pj  =  {Xj  (t^)  |xJ(t)M^x^  (t)  >r^,  V  vev,\r  te 


[t  ,t  (v)],  vu.eu.  }  . 
o  P .  11 

3 


If  P.  belongs  to  this  zone  at  t  =  t  ,  and  the  game  (P.,E)  is  finished 
3  03 

♦ 

in  tp  (v) ,  then  P.  will  not  play  any  role  in  any  game  involving  P., 

provided  that  P^  plays  optimally.  In  this  game,  a  strategic  variable 

associated  with  P.  is  z.  =0,  and  P.  is  of  class  one  as  defined 
11  1 


earlier. 


ii)  Z2P.  (P./P*)  =  {x.(t  )  |xT(t)M.x.  (t)>r^,  Vu.eu.  , 
30  i:  3o'i  11  1  11' 


*  * 

vte[t^,tp  (v  )  ] ,  and  au.  eu. ,  a  vev, 
j  ^  ^ 

ate[t  ,t  (v)  ]  ;xT(t)M.x.  (t)<r^}  . 
o  P .  1  11  —  1 

3 

P^  cannot  catch  an  evader  playing  optimally  according  to  P^  but  there 
exists  at  least  one  play  of  the  evader  that  would  enable  P^  to  cap¬ 
ture  before  P^  does.  P^  can  play  a  role  in  a  game  involving  only 
if  E  does  not  play  optimally  or  if  the  presence  of  other  pursuers 
forces  E  to  deviate  from  the  (Pj,E)  optimal  strategy.  If  pursuer  P^ 
is  of  class  two  as  defined  earlier,  then  P^  is  able  to  intercept  the 
trajectory  of  the  l-vs.-l  (Pj,E)  game  in  the  prescribed  time  other¬ 
wise  P^  would  not  be  able  to  deny  the  evader  a  course.  The  fact 
that  P^  might  not  achieve  capture  is  here  irrelevant:  the  other 
member  of  the  team,  namely  P^,  will  do  it;  the  question  is  to  know 
if  P^  plays  a  role  in  the  team  effort.  A  pursuer  P^  of  class  two 
belongs  to  the  next  zone. 

iii)  Z3P.  (P./P*)  =  {x.(t  ) |a  U.£U. ,  a  te[t  ,t*  (v*)]; 

30  13  30  11  oP. 


x?(t)M.x. (t) <r^  and  t^  (v  )<t  (v  ) }  . 
X  11  —  1  P .  P. 

J  1 


If  Pj  belongs  to  this  zone,  then  P^  will  play  a  role  in  the  (P^,Pj,E) 


game,  in  which  P^  is  the  primary  threat. 


iv)  Z4P.  (P./P*)  =  {x.  (t  )  I  a u.eu. ,  a  te[t  ,t*  (v*)] 
30  31  30  33  °^i 


x'y(t)M.x.  (t)<r^  and  t  (v  )<t  (v  )  }  . 


w 

•j.\.  ■ 


If  Pj  belongs  to  this  zone,  then  will  play  a  role  in  the  (P^,P^,E) 


game,  in  which  P^  is  the  primary  threat.  A  pursuer  of  class  three 


would  belong  to  Z3  or  Z4. 


v)  Z5P.  (P./P.)  =  {x.(t  )  |x.(t)M.x.  (t)>r':,  vu.eu.,  V  te 
jo  j  i  j^'j  jj  j  jj 


[t  ,t  (v  )]  and  au.  u.,avev,  ate 
O  3  3 


[t  ,t  (v)  1 ;  xT(t)M.x.  (t)<r^}  . 
o  3  33—3 


Pj  cannot  catch  an  evader  playing  optimally  according  to  P^  but  there 


exists  at  least  one  play  of  the  evader  that  would  enable  P^  to  cap¬ 


ture  before  P^  does.  If  P^  belongs  to  this  zone,  then  P^  can  play  a 


role  in  a  game  involving  P^  only  if  E  does  not  play  optimally  or  if 


the  presence  of  other  pursuers  forces  E  to  deviate  from  the  (P^,E) 


optimal  strategy. 


*  T  2 

vi)  Z6P .  (P./P.)  =  {x . (t)M.x . (t) >r . ,  V u. eu. , V  vEV, 
30  3  I  3  3  3  3  3  3 


vte[t  ,t_  (v)]}  . 

o  p. 

1 


If  Pj  belongs  to  this  zone,  then  P^  will  not  play  any  role  in  any 


game  involving  P^,  provided  that  P^  plays  optimally.  In  this  zone, 


the  strategic  varieible  z.  associated  with  P.  is  zero,  and  P.  is  of 


class  one. 

Section  6.  shows  an  example  of  the  derivation  of  these  zones 
for  the  problem  of  the  cutters  vs.  railroad. 


The  definitions  above  concern  the  2-vs.-l  game  (P^,P^,E).  The 


zones  are  the  scime  for  every  pursuer  identical  to  P .  but  depend  on 


i.e.  to  a  new  initial  position 


the  position  x. (t  )  of  P.  at  t  =  t  , 
1  o  1  o 


*i^^o^  corresponds  a  completely  new  set  of  zones.  The  same  kind  of 


definitions  can  be  derived  for  a  three-vs.-one  game  as: 


_  *  * 

ZIP  (P  /P.,P.)  etc.  The  derivation  of  these  zones  is,  of  course, 

JCO  JC  X  ^ 


more  complex  than  in  the  two-pursuer-versus-one  evader  game. 

In  solving  a  N-p\arsuer  geime,  it  is  hoped  that  the  simple  case 
by  case  study  of  the  zones  associated  with  the  possible  pairs  allows 
selection  of  the  relevant  players. 


4.  THE  HELP  ZONE 


The  help  zone  HP.  (P.,E)  =  Z1  U  Z2  U  Z3  U  Z4  is  the  zone  in  which 
]0  1 


P^  plays  a  role  in  the  (P^,Pj,E)  game  where  P^  and  E  play  optimally. 


The  first  way  to  confute  the  boundary  of  the  help  zone  is  by 
remarking  that  this  boundary  separates  the  optimal  2-vs.-l  game 


(P.,P.,E)  in  which  P.  plays  a  role  with  the  2-vs.-l  game  in  which  P. 
1  :  3  3 


does  not  play  a  role;  this  latter  game  is,  thus,  equivalent  to  the 


l-vs.-l  game  (P^,E),  since  P^  is  irrelevant.  Therefore,  if  pursuer 


Pj,  at  t  =  t^,  is  located  on  this  boundary,  then  the  equations  of 


the  necessary  conditions  of  optimality  describing  the  2-vs.-l  (P^,P^,E) 


game  and  the  equations  describing  the  l-vs.-l  (Pj,E)  game  both  hold. 


and  consequently,  the  boundary  of  the  help  zone  is  determined  by  all 


the  possible  Xj(t^)  that  satisfy  these  conditions.  Then  all  the 


equations  must  be  jointly  solved;  this  control  approach  seems  heavy, 
but  is  easily  shown  to  be  equivalent  to  the  solution  of  the  equations 


of  the  2-VS.-1  game  and  letting  the  strategic  variable  z^  approach 


zero,  since  P,  has  less  and  less  influence  while  reaching  the  boundary 


53 


of  the  help  zone,  beyond  which  becomes  useless.  This  is  a  rela¬ 
tively  easy  task.  That  smooth  transition  between  the  l-vs.-l  and 
the  2-VS.-1  games  is  ensured  by  Zy  The  absence  of  the  strategic 
veuricible  prevents  euiy  attempt  to  define  a  help  zone  for  fixed 
terminal- time  games. 

The  second  way  to  derive  the  help  zone  is  a  gaming  approach. 

The  Hamiltonian  of  the  2-vs.-l  game  is  H,  and  the  Hamiltonians  of 

the  l-vs.-l  games  {P^,E)  and  (Pj,E)  are  H^,  H^.  Then  it  is  always 

possible  to  set  H  =  H.  +  H.  +  C. .  where  C. .  is  a  correction  term 

1  :  1]  ID 

such  that  the  equality  holds.  If  collaborates  with  P^  in  the 

capture,  then  H  ^  H^  must  hold.  Thus,  H^  +  /  0  is  the  coopera- 

★ 

tion  condition,  euid  conversely,  H^  ^  ^ij  ~  controls  u^ 

from  the  l-vs.-l  game,  gives  the  boundary  of  the  help  zone.  Though 
different  in  their  approaches,  both  methods  are,  in  fact,  equivalent. 


5.  APPROXIMATIONS  OF  THE  NO-HELP  ZONE 


Zone  Z6  is  important  because  a  pursuer  located  in  zone  Z6  of 
another  pursuer  can  immediately  be  eliminated.  Unfortunately,  if 
the  derivation  of  the  help  zone  is  more  or  less  possible,  Z6  is  more 
difficult  to  compute  since  a  non-optimal  play  of  the  evader  (optimal 
in  the  2-vs.-l  sense)  must  be  accounted  for. 

From  the  definition  of  Z6  given  above,  three  simplifications 
that  yield  approximated  results  can  be  made. 

i)  Z63Pj^(P^/P*)  ={  Xj (t^) |y^(t)M^y^ (t)>r^,  vu^EU^, 


Yte[t  ,t*  (V*) ] }  , 


where  is  the  state  vector  corresponding  to  the  Pareto  game  (Pj,E) 

in  which  E  is  willing  to  be  caught  (rendez-vous  game).  y.(t)  obeys 

the  same  dynamics  as  x . (t)  and  y.(t  )  =  x.(t  ),  but  during  the  develop- 

3  jojo 

ment  of  the  game,  the  evader  uses  its  control  v  to  help  P .  in  the 

3 

*  * 

capture.  If  P^  is  not  in  the  area  of  constant  terminal  time  tp  (v  ) 

in  the  Pareto  game  (Pj,E),  then  P^  belongs  to  Z6.  The  idea  is  that 

*  * 

(P^,E)  lasts  at  most  t^  (v) ;  if  the  game  (P,,E),  with  the  cooperation 
of  E,  cannot  be  ended  within  this  time,  then,  for  sure,  P^  cannot 
help  P^  in  catching  E.  263  is,  of  course,  a  coarse  approximation  of 
26;  also  263 d 26. 

ii)  262Pj^(Pj/P^)  =  {Xj  (t^)  |y^(t)Mjyj  (t)  >r  j,  V  u^etlj. 


V vev, V  te[t^, t*  (v*)]}  , 

where  y^  is  the  state  vector  corresponding  to  the  Pareto  game  (Pj,E) . 

262  uses  the  same  approach  as  263,  but  now,  E  is  constrained  to  the 

reachable  zone  of  the  evader  allowed  by  P^.  In  the  l-vs.-l  game 

(P^,E)  where  P^  plays  optimally,  E,  playing  all  the  possible  policies 

V,  can  cover,  before  capture,  a  zone  named  reachable  zone  of  the 

*  * 

evader  allowed  by  P^,  cuid  the  largest  time  to  captiare  is  t^  (v  )  . 

3 

In  262,  E  is  constrained  to  this  zone  that  P^  attempts  to  reach. 

262  is  a  better  approximation  that  263  at  the  expense  of  some  more 
computation ;  263  C  262  C  26 . 


iii)  261P.  (P./P*)  =  {x.  (t  )  !yT(t)M.y.  (t)>r^,  Vu.eu., 
3°  3  1  3  o  '  3  y  ■}  3  3  : 


V  vev,  vte[t  ,t  (V)]}  , 
o  P . 

1 


where  is  the  state  vector  corresponding  to  the  Pareto  (P^jE) 


gcune.  In  Z61,  E  is  also  constrained  to  its  reachable  zone  but  the 


exact  time  t^  (v)  allowed  by  P. ,  associated  with  every  policy  v  of 
i  ^ 


E,  is  computed  and  fixes  the  maximvim  duration  of  the  Peureto  game 


(Pj,E)  during  which  P^  attempts  to  reach  E. 


For  those  game  in  which  the  l-vs.-l  game  (P^,E)  lasts  long 


enough  ccanpares  with  the  inertia  of  P^,  a  major  simplification  ceui 


be  made  if  the  control  u.  considered  is  fixed  to  be  a  trajectory 

*  It 

orthogonal  to  the  trajectory  of  the  evader  at  time  t^  (v  ) . 

*  *  ^ 

Z63  is  to  be  used  when  tp  (v  )  only  is  available,  Z62  when  the 

i 


reachable  zone  of  E  according  to  the  l-vs.-l  game  (P^,E)  is  known. 


and  Z61  when  the  exact  timing  of  this  reachable  zone  is  known. 


6.  TV-ra-CUTTERS  VS.  RAILROAD  EXAMPLE 


6.1  Time-optimal  conditions 


The  cutters  P^  attempt  to  intercept  in  minimum  time  a  train  E, 

constrained  on  a  railway  (axis  x^) .  The  lethal  area  of  the  cutters 

is  a  circle  of  radius  r.;  the  cutters  of  speed  P.  control  their 

1  1 

heading  angle  u^,  eind  the  evader  controls  its  speed  v,  bounded  by  e 
The  l-vs.-l  version  of  this  game  has  been  referred  to  as  "the  wall 
pursuit  game"  by  Isaacs  [3],  or  sometimes  as  "ICBM  vs.  railroad". 
Due  to  its  simplicity,  this  game  can  be  solved  geometrically  or 
using  control  theory.  The  important  case  where  E  is  between  or 
surrounded  by  its  two  opponents  as  well  as  its  implications  on  the 
strategic  variable  z^  =  1)  are  investigated. 


56 


The  game  of  kind  will  be  studied  through  two  examples. 

*  * 

The  Hamiltonian,  the  optimal  policies  (u^,v  ),  the  costate 
vectors  states  (x^)  are  given  by 

H  =  -A^^p^cos  (u^)  +  A^2f'^~P]^sin(u^))  -  A2j^P2COS  (U2)  + 


^22  ^  ' 


(70) 


cos  (u^)  =  A^^/i/a^^  +  A^2  ' 


(71) 


sin(u*)  =  A.2//aJ^  +  XI2  , 


V  =  e*sign(A^2  +  ^22^  ' 


where  "sign"  is  the  signum  function. 


A,i=0  , 


Ai2=0  , 


(72) 


at  t  =  tf  A^^  =  v^r^cos(a^)  ,  A2^  =  V2r2Cos(a2)  , 


^12  =  '  ^22  =  V2®^"^«2^  ' 


and  2  V  r  =  v  r 

2  11  2  2  ' 


X. ,  =  r,cos(a.)  +  Tp.cos(a.)  , 
il  1  1  1  X 


(73) 


x^2  “  sin(a^)  +  Tp^sin(a^)  -  xe- sign (sin (a^) )  +  xeC^  , 


wi  th 


C.  =  1-sign (sin (a. ) )  -  1 -sign (sin  (a, )  +  z  sin{a_))  ; 


ot^  are  line  of  sight  cuigles  relative  to  axis  at  the  terminal  time; 
T  is  defined  as  t^  -  t.  The  strategic  variable  appears  in  (73) 
only  in  a  signum  function,  therefore  has  a  switching  effect. 

The  capture  condition  is  expressed  by 


+  Z2P2  -  e|sin(a^)  +  Z2sin(a2)  0  . 


(74) 


Three  inequalities  restrict  the  strategic  variable:  the  capture 
condition,  expressed  at  the  terminal  time  and  the  playadsility  con¬ 
ditions  applied  to  each  pursuer,  which  state  that  the  trajectories, 
as  T  increases  from  zero,  must  go  away  from  the  terminal  manifold. 
Defining  d^  as  the  tangent  vector  to  the  trajectory  at  time  T  =  t^  -  t 
this  geometrical  condition  is  expressed  by 


d.  =  (x..  -  r.cos(a.)  ;  x..  -  r.sin(a.))  , 

X  XX  X  X  Xm  X  X 

(x..  -  r.cos(a. ) )cos(a. )  +  (x. _  -  r . sin(a. ) ) sin(a. )^  0 
XX  XX  X  Xm  X  X  X 


(75) 


The  strategic  variable  is  not  constant  when,  for  example,  the 
common  opponent  is  surrounded  by  the  pursuers.  That  possibility 
exists  whenever  a  team  of  at  least  N  members  faces  an  opponent  moving 
on  a  space  of  dimension  N  -  1.  Then  z^  adopts  a  value  such  that  the 
control  of  the  evader,  v,  disappears  from  the  Hamiltonian  at  the  last 
moments  of  the  game. 


6.2  Game  of  kind  euialysis 


The  trajectories  are  given  by  (73) ,  the  limit  of  the  help  zone 


for  P,  is  obtained  by  letting  z.  approach  zero  in  the  trajectory  (73) 


The  result  is  given  below;  the  variable  is  a^f  ^  ~ 


for  P^- 

are  given  by  the  l-vs.-l  game  (P^^,  E) . 

=  r^cosCa^)  +  Tp2Cos{a2)  , 

X2  =  r2sin(a2)  +  Tp2Sin(a2)  -  Te • sign ( sin (a^) )  , 


(77) 


Prom  the  above  computations,  Veurious  significant  zones  of  cap¬ 
ture  and  help  for  location  of  P_  at  t  =  t  ,  may  be  computed,  relative 

2  o 

to  X, (t  ).  For  example,  with  p,  =  e,  e  =  1,  r,  =  1,  r_  =  o.5  and 

■L  O  1  12 

x^(t^)  =  (4  ;-4).  Figure  13  presents  the  zones  for  P2  =  1.5  and 

Figure  14,  for  P2  =  0.5,  is  an  example  where  P2  is  unable  to  capture 

*  * 

E  in  the  game  (P  ,E)  and  t  (v  )  is  infinite.  Z6,  the  zone  of  cap- 
ture  by  P^  alone,  is  the  most  difficult  to  calculate.  The  three 
approximations  are  computed  according  to  their  definitions  in 
Section  5. 

The  interval  given  by  X2  e [3. 95,-1. 79]  in  Figtures  13  and  14 
provides  the  segment  that  E  can  reach  when  P.  plays  optimally. 


Zl:  P2  alone  will  capture  E. 

Z2:  P2  alone  will  capture  an  optimal  E. 

Z3;  P2  will  be  helped  by  Pj^  in  capturing  E. 
Z4;  P]^  will  be  helped  by  P2  in  capturing  E. 
Z5:  P]^  alone  will  capture  an  optimal  E. 

Z6:  Pj^  alone  will  capture  E. 

Approximations  of  Z6: 

- Z61. 

- Z62. 

- Z63. 


Figure  13.  Cooperative  zones  for  2-vs.-l  example  with  pursuer  speed 


V.  THE  LINEAR  QUADRATIC  GAME  OF  DEGREE 


1.  THE  QUADRATIC  TEAM  GAMES 

The  hierarchical  command  and  communication  structure  for  a  linear 
quadratic  game  of  N  pursuers  and  1  evader  is  derived.  The  optimal 
solution  and  a  simpler  form  of  this  solution  are  given  yielding  the 
general  solution  to  the  N-pursuer  vs.  one-evader  game.  A  sensitivity 
study  leads  to  a  simple  hierarchical  structure  which  greatly  reduces 
the  amount  of  computation  required. 

Two-player,  linear,  quadratic  games  have  been  extensively  studied 
in  the  literature.  According  to  Clemhaut  and  Wan  [21],  conceptually, 
a  linear  quadratic  model  may  be  regarded  as  a  Taylor  approximation 
to  more  general  models.  Another  advantage  is  that,  con5>utationally, 
the  closed-loop  control  can,  in  principle,  be  numerically  determined. 

On  the  other  hand,  two  main  disadvantages  must  be  overcome.  A 
quadratic  objective  function  implies  satiability  at  some  finite  state 
vector.  Unbounded  controls  make  any  pursuit-evasion,  game-of-kind 
type  of  analysis  impossible.  More  importantly,  even  for  a  two-person, 
two-state  problem,  the  solution  involves  a  Ricatti  system  including 
six  second -order  differential  equations. 

Team  games  emphasize  these  disadvantages;  the  very  definition 
of  the  performance  index  actually  defines  the  type  of  cooperation 
between  teammates;  this  issue  is  addressed  in  Chapter  VII.  The 
gaune  becomes  an  N+1  point,  boundary-value  problem  involving  heavily 


interrelated  Ricatti  equations  and  has  a  complex  hierarchical 
structure . 


2.  l-VS.-l  LINEAR  QUADRATIC  GAME  OF  DEGREE 


The  game  studied  is  a  lineeu:,  quadratic,  generalized  N-pursuer, 
one-evader  game  with  the  reduced  state  given  by 


X.  =  A.x.  +  B.u.  +  C.v  , 
1  i  1  11  1 


(78) 


where  i  =  1,2,..,N;  x^eR*^,  u^eR*^  is  the  control  of  pursuer  and 
veR*^  is  the  control  of  the  evader;  A^,  are  matrices  of  appro¬ 

priate  dimension. 

The  terminal  condition  is  given  by  a  manifold  g(x^(t^))  0 

or  again  more  specifically 


x[(tf)M.Xi(tf)  <  rj 


(79! 


and  the  performance  index,  to  be  minimized  by  the  pursuers  and  maxi¬ 
mized  by  the  evader,  is 


J  =  0.5  f  ^  (Z  (xTq.x.  +  uTr.u.)  -  v^Sv)dt 

tQ  ^  1*1  1111 


(80: 


where  M^,  Q^,  R^,  S  are  appropriate,  positive  definite,  symmetric 
matrices. 

Let  X^eR^  be  the  costate  vector  required  by  the  game  solution 
with  the  minimian  principle  such  that  the  transversality  condition  is 
given  by  the  relation 


X . (t,)  =  vz.M.x. (t,) 


(81 


where  V,  a  Lagrange  multiplier,  satisfies  the  terminal  Hamiltonian 


H(t  )  =  [Z (A. X. +B.u.+Cv)'^X. (x?Q. X. +uTr.  u. )  + 
£  ixxx  i2  i*!  x  x  x  i 

i  i 


1  T 

Vsv)].  =0 

tf 


The  strategic  variable,  =  1  for  a  l-vs.-l  game,  and  it  can 
be  assumed  that  z^  =  1  (most  effective  pursuer)  and  z^  <  1  for 
j  =  2 ,  . • . , N  in  general . 

The  open-loop  Nash  solution  according  to  the  minimum  principle 
for  the  one-pursuer  case  (i=l)  is  given  by 


u*  =  rT^eTt.^. (t,t  )x.{t  )  , 

1  X 


V*  =  s"^cTt.$. (t,t  )x. (t  )  , 
1  1  1  o  1  o 


where  the  state  transition  matrix. 


'l>i(t,to)  =  (A^  +  N^T^  +  L^T^)$^(t,t^)  , 


$^(t,t)  =  I  , 


-IT  -IT 

N,  =  B.R.  B.  ,  L.  =  C.S  c:  ; 

1  111  1  1  1 


T^  is  the  solution  to  the  matrix  Ricatti  equation. 


T,  =  -A.T.  --T.A.  -  T. (N.  +  L.)T.  +  Q.  , 

1  11  11  11  11  1 


T. (t,)  =  VZ.M.  . 
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The  solution  is  classical.  The  non-zero-sum  game  formulation, 
in  which  P  and  E  do  not  have  a  similar  performance  index  is  treated 
by  Mohler,  Kolodziej  and  Bugnon  [22].  The  vector  and  the  matrix 
T^  are  duplicated;  (85)  becomes  a  system  of  coupled  Ricatti  equations 
A  procedure  to  decouple  the  Ricatti  equations  is  given  by  Simaan  and 
Cruz  [23] ,  taking  advantage  of  a  preliminary  solution  common  to 
several  initial  conditions,  reducing  the  problem  to  the  computation 
of  successive  linecu:  equations. 

A  complete  derivation  of  the  two-player,  linear,  quadratic  game 
is  given  by  Ichikawa  [24]  and  Hamalainen  [25] . 

3.  TWO- vs. -ONE-GAME  SOLUTION 

The  two-pursuer,  one-evader  game  requires  two  costate  vectors 

* 

with  transversality  condition  (81)  for  i=l,2.  u^,  i=l,2,  is 

given  by  (83)  and 

2 

V*  =  s"^[  Z  c'^T.-f.  (t,t  )x.  (t  )]  ,  (86) 

..111  o  1  o 
1=1 


The 


state  transition  matrices  are  given  by  (84),  and  H(t^)  by 


(82)  . 


T^  on  the  other  hand,  is  computed  from 


'^1^1  ^  ^"'^1^i"^i'^1“'^1^^1'''H^'^i''’®1^^i"'^iS^~^''2'^2^2  ' 


”^2*2  “  ^“'^2^2”^2'^2“'^2^^2''’^2^'^2''^2^^2"'^2^2^~^^i'^1^1  ' 


(87) 


where  T,  (t..)  =  vz.M,  ,  T„(t..)  =  vz„M„. 


The  solution  is  very  complex  compared  to  the  l-vs.-l  game.  Th 
differential  systems,  (84)  and  (87),  are  tightly  coupled,  recjuiring 
parallel  computation  whereas  the  solution  to  the  l-vs.-l  game  could 
proceed  in  several  steps. 

must  be  selected  properly  to  correspond  to  the  initial  con¬ 
ditions  of  the  game.  It  might  recjuire  a  trial  and  error  procedure 
whose  computational  burden  would  be  somewhat  reduced  using  the 
method  described  by  Simaan  and  Cruz  [23] . 

By  emalogy  with  the  strategic  variable,  a  strategic  matrix 
is  defined  as  a  symmetric,  non-singular  matrix  satisfying  the  rela¬ 
tion 


•-1  -1  T  -1  T  -1  -1 

z/=  z/(a:+q^t/)  -  (A>Q,t/)z/ 


and 


^2^'V  - 


The  similarities  with  the  l-vs,-l  game  are  now  obvious;  the 
equations  are  decoupled  and  can  be  solved  in  turn.  However,  the 
problem  of  guessing  the  strategic  variable  remains.  Note  the  sim¬ 
plification  that  =  Q2  “  would  introduce. 


4.  STRUCTURE  OF  AN  N-IDENTICAL-PURSUER, 
ONE-EVADER  LINEAR  QUADRATIC  GAME 


The  solution  for  Q=0  teikes  the  form  given  by 


*  T 

u.  =  B  T.x.  , 
1  11 


V  =  C  E  T.x.  , 

.,11' 

1=1 

T  -1 

T.  =  -T.A-A  T. -T. (N+L)T.-T.L(  E  Z.)Z.  T.  , 
1  1  11  11...  i  ii' 

and  =  vz^M  ,  v  given  by  H(t^)  =  0  , 


.  (ji  ip 

Z.  =  Z.A  -  A  Z.  , 
11  1 


•-1  -IT  T  -1 
Z.  =  Z.  A  -  A  Z.  , 
11  1 


with  Z^(t^)Mx^(t^)  =  (Z^/Zj^)  Mx^(t^) 


zT^(tf)  =  [Z^{t^)]"^  . 


The  summation  sign  (E)  on  the  control  of  the  evader  v  ,  and 

* 

the  independence  of  the  controls  u^  as  well  as  the  form  of  the 
solution  show  that  each  pursuer  P.  needs  to  receive  information  about 


the  optimal  control,  v  ,  eOiout  the  strategic  matrices  and  the 
strategic  variables  in  order  to  carry  out  its  own  optimization 

algorithm.  This  results  in  u^  while  providing  information  about 

*  1  T 

T.x.  (to  compute  v  ),  x. (t_)  (to  compute  Z.),  emd  J.  =  —  /  (u.R.u.)dt 
11  if  1  i2toiii 

(to  compute  z^  and  zj  . 

It  can  be  viewed  as  the  nesting  of  three  hierarchical  structures 
that  must  iteratively  be  optimized,  going  from  the  lowest  one  to  the 
highest  one.  Figure  15  shows  that  structure. 

First,  according  to  a  given  pair  (Z^,z^),  the  loop  involving 
ALl  is  taken  into  accoxint,  each  pursuer  producing  Then, 

according  to  these  x^(t^),  the  new  Z^  are  to  be  computed  by  AL2 
according  to  (93) ,  cind  so  on  until  the  point  where  the  new  values  of 


Z^  match  the  old  ones.  As  a  last  step,  are  analyzed  in  AL3  to 
produce  the  new  strategic  vector  z  =  [z.,z  , ..,z  ]  .  Consequently, 

i)  Alii  is  purely  deterministic:  v  =  cEt.x,  . 

1  ^  ^ 

ii)  AL2  is  tactical:  the  main  strategic  option  being  defined 
by  the  strategic  vector  z,  AL2  merely  computes  the  trajectories, 
etc.  to  perform  the  capture.  E  Z.  must  be  computed  according  to 
the  differential  equations  given  for  Z^  which  depend  on  x^ (t j) .  An 
algorithm  that  ensures  the  convergence  of  Z^  to  yield  the  values  of 
Xj(t^)  produced  by  the  pursuers,  must  be  added. 

iii)  AL3  is  strategical.  The  values  in  the  strategic  vector 
define  early  strategical  choices  viz.  which  is  the  most  important 
player  representing  the  main  threat  to  be  considered  by  the  evader, 
which  are  the  irrelevant  pursuers  (of  class  one),  etc...  z^  must  be 
determined  according  to  both  the  a  priori  information  and  the  a 
posteriori  information  based  on  for  the  previous  vector  z. 


Pursuer  1 


ALl  is  knovm;  AL2  is  a  simple  converging  algorithm,  but  so  far, 


little  is  developed  to  \-'/oid  an  exhaustive  seeurch  for  AL3. 

A  few  fules  in  finding  AL3  are  as  follows; 

i)  Study  the  game  of  kind,  trying  to  select  the  relevant 
pursuers  to  the  game. 

ii)  Attempt  to  guess  the  value  of  according  to  the  initial 
positions  and  some  simple  geometric  rules. 

iii)  The  capturability  and  playability  conditions  that  must  be 
satisfied  by  z^,  delimit  the  hyper-volume  of  definition  of  the 
vector  z. 

iv)  A  heuristic  decomposition  approach  to  the  solution  that 
will  be  introduced  in  the  next  chapter. 

5.  A  SUB-OPTIMAL  STRUCTURE 

Equation  (92)  can  be  redefined  as 


if 


T.  +  -T.A  -  a'^T.  -  T.  (N  +  L)T.  -  g.  -  CO.  , 
11  11  111 


^i 


(A). 

i 


T.L(  E  Z.)Z.  T. 
1  .  ,  :]  1  1 
3=1 


f 


N 


T.L(  E 
^  j=i+l 


z.)z7^t.  . 

Dll 


(94) 


(95) 


Except  for  the  addition  of  6^^  and  co^  in  (94)  and  the  summation 
sign  in  (91),  the  N-vs.-l  game  solution  is  identical  to  the  l-vs.-l 
solution. 

This  suggests  a  sensitivity  study  of  both  and  co^.  In  some 
instances  the  g^  are  nearly  equal  and  the  co^  are  negligible  when 
the  piarsuers  represent  equal  threats  to  the  evader.  In  this  case, 
the  difference  iietween  the  l-vs.-l  geune  and  n-vs.-l  game  is  seen  to 


I 
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cc»se  hardly  from  the  controls  adopted  by  the  piirsuers  but  from  the 
controls  adopted  by  the  evader.  is  an  important  term  if  plays 
a  minor  role  in  the  capture.  On  the  other  hand,  for  all  cases  con¬ 
sidered,  0)^  is  negligible  when  the  playability  condition  is  not 
violated,  provided  that  the  pursuers  be  classified  in  order  of 
importance,  being  the  most  in^rtant  one  (the  closest  one  to  E 
for  the  minimum- time  problems) . 

Consequently,  the  result  is  shown  in  Figiire  16,  where  each 
pursuer  solves  (91)  -  (93),  finding  the  couple  x^(t^),  correspon¬ 
ding  to  x^(t^),  auid  passes  this  information  forweurd. 


JpjrsMr  I  lutttf) 


Por»u«r  2 

- flTI 


!Pur»v;«r  N 


Figure  16.  Sub-optimal  simple  field  hierarchical  structure  for  the 
N-vs.-l  linear  quadratic  game. 


•-'r* 


This  structxare  takes  advantage  of  the  autonomy  of  the  pursuers, 
breaks  the  complex  solution  into  simpler  steps,  and,  to  an  individual 
pursuer,  the  number  and  class  of  the  co-pursuers  is  completely 
irrelevant. 

The  previous  structure  of  Figtire  15,  though  exact,  is  inferior 
in  that  the  search  of  the  optimal  solution  is  quite  tedious,  AL3,  in 
particular,  is  vague.  The  information  passed  (matrices  Z. ,  vectors 
z  and  x^,  Cx^T^)  is  substantial  compared  to  the  new  structure. 

A  parallel  structxire  is  derived  from  this  "ripple"  structure 
in  Figure  17,  enhancing  the  independence  of  the  individual  pursuers 
with  respect  to  the  team,  but  at  the  e^ense  of  aui  increased  number 
of  equations  to  solve.  A  "minor"  pvirsuer  can  be  added  but  the  gain 
produced  by  this  pursuer  must  be  weighted  against  the  amount  of 
delay  or  computation  that  this  very  pursuer  will  have  to  cope  with. 

The  structure  depicted  in  Figure  17,  and  implemented  for  the 
exan5)les  considered  does  not  show  any  variation  in  state,  control  or 
performance  superior  to  0.1%  of  the  rigorous  solution. 


Figure  17.  Independent  hierarchical  structure  for  the  N-vs.-l  linear 
quadratic  game. 


VI.  A  COMPOSITION  APPROXIMATING  ALGORITHM 


1.  INTRODUCTION 

The  main  difficulty  in  solving  an  N-vs.-l  pursuit-evasion  game, 
using  the  necessary  conditions  of  optimality,  is  to  guess  properly 
various  veuriables  such  as  the  terminal  hit  points  and  the  strategic 
variables.  This  problem  gets  more  and  more  difficult  as  the  number 
of  pursuers  increases.  The  l-vs.-l  problem,  though  complex,  is  often 
solvable  whereas  the  2-vs.-l  problem  is  nearly  impossible  to  derive 
globally. 

In  this  section,  the  solution  to  the  2-vs.-l  linear,  minimum¬ 
time,  team  game  is  simplified  using  a  composition  approach.  The 
two  individual  l-vs.-l  games  are  assumed  to  be  previously  solved. 

The  approach  consists  of  the  computing  of  an  intermediate  stage  in 
which  an  "equivalent"  l-vs.-l  game  is  defined.  The  solution  to  this 
equivalent  l-vs.-l  game  is  used  to  find,  in  a  similar  way,  the 
solution  to  the  2-vs.-l  game. 

The  study  of  the  2-vs,-l  game  of  kind  done  so  far  corresponds 
to  a  direct  optimal  approach.  The  direct  composition  is  defined 
first;  i.e.,  the  problem  addressed  is  how  to  compute  the  equivalent 
l-vs.-l  game  solution  from  the  two  l-vs.-l  games. 

The  first  step  is  to  prove  that  such  an  equivalent  game  exists; 
therefore  the  composition  is  first  derived  analytically  from  the 
results  of  both  the  l-vs.-l  and  the  2-vs.-l  games. 
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2.  ANALYTIC  COMPOSITION 


The  coinputation  of  the  initial  position,  dynamics  and  terminal 
manifold  of  pursuer  P  whose  effect  on  the  game  studied  (P,E)  is 
identical  to  the  effect  of  the  pair  (P^/P^),  is  nauned  conposition . 

Three  major  properties  of  the  ideal  composition  must  be  relaxed: 

i)  P  must  be  equivalent  to  Pj^  and  P^  for  every  optimal  and 
non-optimal  play  of  the  pursuers.  So  far,  in  every  problem  studied, 
the  optimality  of  the  pursuers  was  assumed  since  the  main  goal  is 
to  study  the  team  and  not  the  escape  maneuvers.  Thus,  optimality 
will  be  assumed  for  the  pursuers,  even  though  a  problem  is  to  get 
rid  of  a  non-optimal  play  of  a  (P^,P2,E)  game  that  might,  in  fact, 
correspond  to  eui  optimal  play  of  a  (P^,P2,P2»E)  game. 

ii)  P  must  be  equivalent  to  P^^  and  P^  for  every  optimal  and 
non-optimal  play  of  the  evader.  This  is  a  reasonable  request,  but 
for  the  sake  of  simplicity,  the  evader  will  be  assumed  to  behave 
optimally. 

iii)  To  every  position  of  P^  and  P^  there  corresponds  a  position 
of  the  equivalent  pursuer  P.  Enforcing  this  property  would  solve  the 
2-VS.-1  geime  globally.  The  study  of  this  functional  will  not  be 
under  tciken. 

By  composition,  all  the  above  simplifications  are  assumed.  The 
scope  of  interest  of  the  composition  operator  is  considerably  re¬ 
duced,  but,  as  shown,  existence  is  the  main  obstacle  to  a  broader 


definition. 


In  the  following  equations,  the  2-vs.-l  game  is  written  with 


indices  when  the  equivalent  l-vs.-l  game  has  none.  The  pursuers 
are  identical.  The  minimum-time  problem,  where  v,  the  control  of 
the  evader,  is  constrained  by  [vj  ^  V  and  u^,  the  control  of  pursuer 
P^,  by  [v^l  £  is  analyzed. 

The  reduced  states,  the  Hamiltonians  and  the  optimal  controls 
are  given  by 

X  =  Ax  +  Bu  +  Cv, 

x^=  AXj^  +  Bu^  +  Cv  ,  (96) 

X2=  Ax^  +  Bu^  +  Cv  , 


H  =  a'^{Ax  +  Bu  +  Cv)  +  1  , 

H  =  x'F(Ax  +  Bu  +  Cv)  +  X^{Ax_  +  Bu-  +  Cv)  +  1 
Ji  f  ^  XX  X  2  ^  2 


(97) 


Vi  2  =  V'sign((X^  +  X2)C)  , 


*  T 

V  =  V-sign(X  C)  , 


(98) 


and  the  capture  condition  by 

x'^(t^)Mx(t^)  .  ^99) 

By  definition  of  the  composition,  in  (99) ,  both  lines  are 
T  T  T 

identical.  Thus,  X  c  and  (X^  +  X2)C  must  be  identical  switching 
functions  in  order  for  the  composition  to  exist. 


A  simple  possible  choice  is 


X 


T 


rp  m 

k(X^  +  X^)  , 


(100) 


in  which  k  is  a  positive  constant. 

Both  the  2-VS.-1  and  the  l-vs.-l  game  must  verify  the  necessary 
conditions  of  optimality  when  (100)  is  verified. 

The  maximum  principle,  that  gives  the  control  policies,  is 
obviously  verified,  the  costate  dynamics  given  by 


X*^  =  -X'^A 


X 


T 

1 


(101) 


^2  "  "^2^  ' 


do  not  conflict  with  the  time  derivative  of  (100)  expressed  as 


i’'  -  k(X^  .  %l) 


At  t  =  t^,  the  transversality  conditions 


T  T 

X  (t-)  =  VX  M  , 


X^(tf)  = 


^2^"f^  =  \,2-2¥ 


must  be  verified;  (100)  and  (103)  yield 


(102) 


(103) 


k(x'F(t,)  +  X^(t  ))  =  vx'^M  =  V  ,(z  x'y  +  z-xLkM  , 


(104) 


or,  defining 


x(t,)  =  (t-)/a  , 

r  e  r 

-1/k  =  2v  (x'^MAx  /a  +  x'^Cv  -  x'^IiBUsign (x^MB) )  ,  (108) 

e  e  e 

a  =  +  /x'^(t-)Mx  (t  )  /r  . 

—  e  f  e  t 

a  is  chosen  so  that  k  is  positive,  and  then,  v  =  akv  _  . 

If 

Note  that,  in  finding  x(t^),  the  position  of  the  equivalent 

piursuer  at  time  t  -  t  ,  only  "a"  really  matters.  Thus,  provided  that 

o 

the  2-VS.-1  game  be  solved  first,  finding  the  equivalent  pursuer 
(i.e.,  composing)  is  an  easy  procedure  for  the  linear  system  (96), 


(100)  is,  in  particular,  a  valid  choice. 


3.  AN  APPROXIMATION  PROCEDURE 

The  game  is  solved  in  two  steps,  according  to  Figure  18.  The 

first  step  is  to  compute  the  l-vs.-l  equivalent  game,  namely  x(t  ) 

o 

* 

and  to  find  t^,  v  and  then,  as  a  second  step,  the  con^lex  2-vs.- 
game  is  transformed  into  a  two-dimensional,  time-varying,  one-sided 
control  problem  with  fixed  terminal  time,  much  simpler  to  solve  than 
the  original  game. 

The  procedure  is  mainly  concerned  with  the  so-called  direct 
composition.  The  two  main  phases  of  the  problem  are  to  find  x(t^) 
and  to  estimate  z^. 
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Figure  18.  Composition  approximating  algorithm  for  2-vs.-l  game 
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The  composition  is  performed  using  backward  integration  from  a 
terminal  hit  point  given  by 

x(t  )  =  X  (t  )kv/v  ,  (109) 

f  e  f  1,2 

where  x^  is  defined  in  (105) .  This  relation  merely  expresses  the 
colinearity  between  x  and  x  at  t  =  t,,  the  constant  term  kv/v 

6  X  X  /  A 

simply  adjusts  the  modules  in  order  to  satisfy  the  terminal  condition 
(99) .  If  r  =  1  then  it  is  readily  seen  that  | [x  (t  ) | 1  =  V  _/(kV) . 

0  X  X  ^  ^ 

Then,  integrating  backwards  in  time,  the  state  of  the  2-vs.-l 
and  the  equivalent  l-vs.-l  games  is  given  by 

x(t)  =  $(t,t^)x(t^)  +  /^^$(t,T)Bu(T)dT  +  /^^iI'(t,T)Cv(T)dT  , 
Xi(t)  =  'I>^(t,t^)x^(t^)  +  /^^ii>^(t,T)Bu(T)dT  + 

0, (t,T)Cv(T)dT  ,  (110) 

tf  1 

X2(t)  =  $2 

$  (t,T)Cv(T)dT  , 

where  $,  the  corresponding  state  transition  matrices,  and 

A,B,C  are  time-invariant  matrices. 

P,  and  P^  have  the  same  homogeneous  part  in  their  differential 
equation  (96) ,  therefore 


<I>(t,t  )  =  4*  (t,t  )  =  4>  (t,t  )  =  exp(A(t  -  t  ))  . 


(Ill) 


The  optimal  controls  are 


*  T 

u  =  -U-sign(X  B)  , 


=  -Uj^*sign(X^B) 


U2  =  -U^-signCX^B) 


(112) 


V  =  V*sign(A^C  +  A^C)  , 


*  T 

V  =  V-sign (A  C)  . 


Using 


A(t^)  =  (x^(t^)  +  Z2X2(t^))/a  =  x^(t^)/a  ,  (113) 

with  (100)  and  (110),  u  =  U^^  =  U^  =  1,  V  =  1,  then 

x(t)^=  exp(A(t-t^)  )x(t^)  +  /^^exp(A(t-T)  )B-sign(  (A^+A2)B)dT 

(114) 

-/^^exp(A(t-T) )C*sign( (A^+A2)C)dT  , 

is  derived.  (114)  can  be  expressed  as 

x(t)  =  exp(A(t-t^) )x^(t^)/a  +  /^^exp(A(t-T) )B-sign(A^B)dT/a 

-f^exp  (A(t-T)  )C-sign(  (A^+A2)C)  dT/a 

(115) 

-  z^/^^exp (A(t-T) )C.sign( (A^+A^) C) dT/a 


+  z  exp(A(t-t  ) )x  (t  )/a 


I*; 


+  )dT/a 


+  /^^exp(A(t-T) )B(sign( (X^+X2)B) -sign (X^B)/a 


sign(X2B))dT/a  -  /^^exp (A(t-T) ) C(sign { (X^+X^) C) 


-  sign( (X^+X2)C)/a  -  z^sign ( (X^+X^) C/a) dx  , 


In  (115),  terms  x^(t)/a  and  Z2X2(t)/a  can  l^e  recognized.  Thus  (115) 
equates 


x(t)  =  X  (t)/a  +  ER  , 
e 


ER  =  /  ^exp(A(t-T)) [B (sign( (X^+X^) B  -  sign(X^B)/a 
t  X  ^  JL 


-  sign(X2B)/a)  -  C{1  -  (1 


( l+z^ )  /a)  )sign  ( ( X^+X^ )  C)  ]  dx 


Now,  (116)  is  an  importaint  result,  from  which  the  procedure  is 
derived.  ER,  in  particular  must  be  carefully  studied:  similar  terms 
subtract  from  each  other  and  the  result,  a  switching  function,  is 
integrated  against  the  ejqsonential  matrix.  Thus,  a  small  value  of 
ER  is  likely,  especially  when  the  number  of  switches  is  important. 

In  the  sequel,  ER  will  be  assimied  to  be  negligible  compared  to 
Xg(t)/a,  thus,  an  estimate  on  the  upper  bound  of  ER  is  of  interest. 

Such  a  bound  can  easily  be  obtciined  by  squaring  ER  and  using  Schwartz's 
inequality.  Unfortulately,  such  an  inequality  has  the  drawback  of 


totally  suppressing  the  condensations  in  ER,  resulting  in  a  very  bad 
upper  bound. 

Since  (116)  holds  at  einy  time,  it  holds  for  both  t  =  t  and 

o 

t  =  t^  where  ER  equals  zero,  thereby  verifying  hypothesis  (113) . 

As  a  conclusion,  the  approximated  method  performs  the  composi¬ 
tion  from  the  2-vs.-l  game,  therefore  requiring  the  estimation  of 

z^  and  then,  according  to  (116)  the  initial  position  of  the  equivalent 

* 

pursuer  is  x(t  )  =  x  (t  )/a.  Then,  the  l-vs.-l  game  is  solved,  v 
o  e  o 

the  optimcil  control  of  the  evader  and  t^  the  terminal  time,  are  com¬ 
puted.  As  a  last  step,  the  control  problem  of  finding  the  trajector- 

it 

ies  and  controls  of  the  two  pursuers,  knowing  v  and  t^  is  solved. 

The  estimation  of  z^  is  the  most  delicate  problem  in  the 

method;  the  information  available  consists  of  the  l-vs.-l  games.  The 

estimate  of  the  strategic  variable  will  take  full  advantage  of  the 

whole  prior  information  of  the  game:  the  l-vs.-l  Nash  and  Pareto 

(i.e.,  cooperative)  game  solutions,  the  study  of  the  game  of  kind 

and,  particularly,  the  computation  of  the  help  zones. 

Let  tj^  cuid  t^  be  the  terminal  times  of  the  individual  l-vs.-l 

(P, ,E)  and  (P_,E)  games.  Let  t,  and  t...  be  two  limits  such  that, 

1  ^  Im  IN 

because  of  the  very  position  of  P^,  cooperation  is  possible  with  P^ 

only  if  t,  <  t,  <  t,...  Then,  if  t,  <  t,  ,  z_  =  0  (P,  is  so  close 

Im—  1—  IM  1—  lm2  1 

to  E,  that  P_  is  useless)  and  if  t,  >  t,.,,  ..  =  (P,  is  so  far  away 

2  1  —  IM  ^  1 

from  E  that  P^  does  need  any  such  help) . 

The  siirplest  formula  verifying  the  above  requirements  is 

^2  =  ’^(^l  - 
where  k  is  a  constant. 


Using  the  same  approach,  if  t.  amd  t...  can  be  found  such  that 

aIR  2M 


cooperation  is  possible  with  only  if  t^  <  t,.  <  t^.w  then 

2  2ltl  —  2  —  2M 


(118) 


respects  the  limit  conditions,  but  the  problem  is  to  fix  k.  \^hen 
=  t^,  both  pursuers  are  likely  to  be  equally  dangerous  to  the 
evader,  therefore  =  1  should  be  enforced.  This  remark  yields  the 
formula  that  will  be  used  for  «•/ 


(119) 


If  none  of  the  four  limits  t,  ,  t.,  ,  t...  and  t„.,  can  be  com- 

UR  2in  IM  2M 


puted,  then  t  =  t-  =0  and  t,„  =  t-„  =  “  is  assumed,  the  estimate 
in  2n  iM  2M 


becomes 


^22  =  <V^2^' 


(120) 


whereas  the  simplest  estimate  for  z^  is 


^23  =  ^^2  * 


(121) 


The  four  time  limits  are  computed  as  follows: 

i)  When  t^  <_  t^^,  then  does  not  play  any  role  (z^  =  0) .  Th( 
best  case  for  P^  is  when  E  is  willing  to  be  caught:  the  correspond¬ 
ing  time  is  the  terminal  time  of  the  Pareto  (P2,E)  game.  If  t^  is 
even  smaller  than  this  time,  then,  for  sure,  P_  is  useless. 


Thus:  IS  the  terminal  time  of  the  Pareto  (P^/E)  game. 

^2m  terminal  time  of  the  Pareto  (Pj^,E)  game, 

ii)  When  t^  ^  ^IM'  does  not  play  any  role,  and  be¬ 

comes  infinite.  When  t^  =  t^^,  then  P^^  is  located  on  the  limit  of 
the  help  zone  of  P^,  limiting,  by  definition,  the  cooperation  zone. 
Thus  a  possible  bound  is  to  take  the  maximum  of  the  terminal  times 
of  the  (P^,E)  games  with  P^  located  on  the  help  zone.  The  help  zone 

is  usually  assymetrical ,  since,  for  a  given  distance  x  (t  ) ,  the 

1  o 

cooperation  is  best  when  the  evader  is  surrounded  by  the  pursuers. 
Thus,  every  l-vs.-l  trajectory  intersects  with  the  boundary  of  the 
help  zone  at  a  very  different  time,  depending  on  the  position  of 
the  intersection.  Thus,  t,  „  will  be  taken  as  the  terminal  on 

the  limit  of  the  help  zone,  along  the  trajectory  going  through 

(an  example  will  be  shown),  t^j^  -  t^  represents  the  time 
separating  from  the  help  zone.  A  similar  definition  is 

adopted  for 

4.  TIME  OPTIMAL  EXAMPLE 

The  game  studied  has  the  reduced  state  equations 


X.  =  Ax.  +  Bu,  +  Cv 
111 


(122) 


and  capture  is  achieved  whenever 


(123) 


xT(t,)Mx.  (t.)  < 
if  1  f  —  1 


with  M  =  I,  r.  =1. 

1 

The  minimum-time  problem  with  identical  pursuers  is  considered. 
Applying  the  necesseury  conditions  of  optimality  yield  the  optimal 
controls,  the  costate  dynamics  euid  the  formula  for  the  Lagrange 
multiplier  as 

*  ip 

u^  =  -l’sign(A^B)  , 

V  =  l-sign(ZA'?'C)  , 
i 

*  T  T 

Ai  =  -A^A  ,  (124) 


A^(tf)  =  2vz^x^(t^)M, 


v>  0  -  Zz.xTax.  +  ZIz.xTbI  -  IZz.xTcl  =  v/2  , 

“  .111  .'ll'  'll 

111 

V  <0  -»■  Zz.xTax.  +  ZIz.xTbI  -  IZz.xTcl  =  v/2 
~  .111  .11  .11 
111 

For  a  fixed  Figure  19  shows  the  possible  locations 

of  x^Ct^)  according  to  the  amount  of  cooperation  allowed  by  z^. 

It  is  remcirkable  that  the  trajectory  of  is  not  at  all 
affected  by  z^.  Since  this  trajectory  is  fixed,  and  differs  from 
the  equivalent  l-vs.-l  trajectory,  the  cooperation  of  is  required, 
defining  a  minimum  for  z^  equal  to  0.38  in  this  case. 

Figure  20  shows  that,  when  P^  is  chosen  very  close  to  the 
equivalent  l-vs.-l  trajectory,  then  constrained  to  a  very 


limited  arc,  because  the  requested  help  is  very  specific. 


1- vs . -1  game 

2- VS.-1  game 
2-VS.-1  game 


X2^(t^)  =  -0.27. 

Trajectory  of  P;j^  for  xii(tj)  =  -0.25. 
Trajectory  of  P2  for  X]^i(tf)  =  -0.25,  Z2 


-r  ^  •  A.  ' 

Location  of  P2(t^)  as  a  function  of  Z2  for  tf  =  0.5. 


0.38 


Figure  20.  Sensitivity  relative  to  second  pursuer  close  to  l-vs.-l 


The  solution  to  the  l-vs.-l  game  is  given  in  Figure  21.  The 


area  in  which  capture  Ccui  be  avoided  is  very  small.  This  is  due  to 
the  stable  eigenvector  of  the  matrix  A  added  to  the  advantage  in 
control  of  the  pursuer  with  respect  to  the  evader,  illustrated  by 
the  matrices  B  and  C.  The  lines  of  constant  terminal  time  tend  to 
be  disformed  adong  the  unstable  eigenvector  =  {.-2—/ll)x^^/2, 
more  favorable  to  the  escape. 


>  -  -  Semi-permeable  line . 

mjji  Usable  part  of  the  terminal  manifold. 

>  -■  Trajectory. 

-  Constant  t^. 

Switching  line. 


Figure  21.  Time-optimal  solution  for  1-vs.l  game. 


The  game  is  symmetrical,  the  trajectories  and  the  constant 
terminal  time  lines  are  plotted  only  over  half  of  the  space. 

The  solution  to  the  cooperative  (Pareto)  game  is  given  in 
Figure  22. 


-  >  -  -  Semi -permeable  line . 

ifn'Jii  Usable  part  of  the  terminal  manifold. 

— ) -  Trajectory. 

-  Constant  tf. 

Switching  line. 

Figure  22.  Time-optimal  solution  for  1-vs-l  Pareto  game. 
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The  lines  of  constant  terminal  time  are  wider  apart  emd  there 
is  no  place  where  capture  is  in^ossible;  this  is  expected  since,  in 
the  Pareto  game,  E  collaborates  with  P. 

The  computations  for  three  cases  are  summarized  in  Table  1. 

Cases  1  and  3  correspond  to  a  low  and  a  high  value  of  z^,  quantifying 
the  cooperation.  Case  2  corresponds  to  the  worst  possible  case, 
both  z^  =  1  and  a  *  0  force  a  bad  result.  But  the  expected  failure 
of  the  method  in  that  case  is  not  a  problem  since  it  can  be  computed 
that  V  =0,  this  corresponds  to  a  case  of  m^ucimum  advantage  of  the 
pursuit  team,  and  this  case  is  treated  in  Chapter  VII. 

The  controls  corresponding  to  the  three  cases  are  plotted  in 
Figure  23.  In  Case  1,  the  equivalent  pursuer  has  the  same  control 
as  P^;  in  case  3,  the  same  as  P^^,  when  case  2  corresponds  to  an  equal 

it 

threat  from  both  pursuers.  Of  course,  both  v  and  t^  are  identical 
since  it  is  the  goal  of  the  procedure  to  estimate  those  variables. 

Pj^  euid  P^,  trying  to  colledxirate  as  much  as  possible,  adopt  opposite 
controls,  so  that  E,  playing  a  control  opposing  one  of  the  pursuers, 
then  plays  in  favor  of  the  second  pursuer. 

The  three  different  estimates  of  z^  are  given  in  Table  3. 
gives  excellent  results  in  cases  1  and  3,  but  fails  in  case  2,  where 

A  /V  A 

z^^  is  superior,  corresponds  to  z^^  when  the  varioxis  limiting 

times  are  not  available.  The  results  are  centered  around  z^  =  1;  in 
case  of  uncertainty,  the  best  estimate  is  a  rather  equal  cooperation 
between  the  pursuers. 

Figure  24  shows,  as  an  example,  how  t  is  computed  for  case  1. 


TcUsle  3.  Time-optimal  approximate  solution  comparisons  for  three 
different  degrees  of  cooperation. 


Case  1 

Case  2 

Case  3 

Examples 

Pl(to) 

(-5.36,13.0) 

(-7.67,19.2) 

(-7.79,19.4) 

P2(to) 

(7.78,-19.4) 

(5.47,-12.5) 

(5.35,-13.0) 

2-VS.-1 

Pl(tf) 

(-0.2,0-98) 

(-0.20,0.98) 

(-0.20,0.98) 

exact 

P2(tf) 

(0.2, -0.98) 

(0.20,-0.98) 

(0.20,-0.98) 

solution 

22 

0.2 

1- 

5. 

tf 

1-00 

1.00 

1.00 

Exact 

P(to) 

(-5.38,13.08) 

(4.27,-16.16) 

(5.38,-13.08) 

con^osition 

P(tf) 

(-0.20,0.98) 

(-0.98,-0.20) 

(0.20,-0.98) 

tl 

1 . 0025 

1.100 

1.1075 

t2 

1.1075 

1.0425 

1.0025 

0.860 

0.780 

0.7525 

t2m 

0.7525 

0.850 

0.860 

Estimating 

^IM 

1.40 

1.27 

1.32 

^2 

t2M 

1.32 

1.40 

1.40 

hi 

0.199 

1.85 

3.007 

223 

0.905 

1.05 

1.105 

222 

O.SIL 

1.103 

1.22 

Estimated  I  lx  I 

'  '  e' 

0.787 

0.943 

4.03 

Exact 

a 

0.8 

0. 

4. 

Approximated 

composition; 

p(t ) 
o 

(-4.81,11.61) 

(2.618,-4.412) 

(4.71,-11.34) 

tv 

tv 


Switch  in  v: 
Exact  switch; 


0.589 

0.602 


0.44 


0.584 

0.602 
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VII.  A  CONTROLLABILITY  STUDY  FOR  LINEAR  QUADRATIC  TEAM  GAMES 

1.  INTRODUCTION 


The  fixed  terminal  time  aspect  of  the  linear,  quadratic  team 
games  is  studied  in  this  chapter.  A  controllability  study  of  the 
l-vs.-l  version  of  the  problem  was  made  by  Behn  and  Ho  [26].  These 
results  are  adapted  to  suit  the  N-vs.-l  team  game  under  study,  and 
the  controllability  of  the  team  is  used  as  a  measure  of  the  efficiency 
of  the  individual  pursuers,  providing  a  sufficient  condition  for  a 
pursuer  to  be  useful  to  a  team. 

The  evader  is  shown  to  be  neutralized  by  the  maximum  control¬ 
lability  strategies.  In  cases  where  these  strategies  are  game- 
optimal,  it  yields  a  simple  method  of  solving  the  difficult  problem 
of  optimal  disposition  of  a  team  of  pursuers  by  providing  a  choice  of 
optimal  conditions  to  simplify  the  search  for  the  optimal  solution. 

2.  FORMULATION  OF  THE  GAME 


The  state  of  pursuer  P^,  controlling  u^,  is  defined  by  the 
lineeu:  dynamic  system 


x^  =  F^(t)x^  +  G^(t)u^(t)  , 


(125) 


X.  (t  )  =  X. 

1  o  lO 


The  state  of  the  evader,  controlling  v,  tcOces  a  similar  form: 


X  =  F  (t)x  +  G  (t)v  (t) 
e  e  e  e 


X  (t  )  =  X 

e  o  eo 


(126) 


I 

J 


where  i  =  1,  2,...,  N;  x.  ,  x  eR*',  u.eR™,  vei^,  euid  F.  (t),  G,(t),  F  (t) 

1  0  X  X  X  0 

G  (t)  cire  matrices  of  appropriate  dimensions, 
e 

The  performance  index  is  given  in  terms  of  a  miss  distance  at 
the  fixed  terminal  time  t^  emd  an  energy  integral  component  as 


J  -  i  a'x|  |x  (tj)  -  ,  (tj)  \\\  H-  i  X I  luj(t)  I  I J 

1  A  A  ^  1  1 

-  I lv(t) 11^  )dt  ,  (127) 

e 

T 

where  A  A  selects  the  relevant  components  of  the  state;  R. ,  R  are 

X  0 

appropriate  positive-definite  matrices,  and  a  is  a  weighting  factor 
2 

such  that,  if  a  ,  the  problem  is  to  capture  the  evader  with  mini¬ 
mum  energy.  Here  a^-^  is  used  in  the  sense  |x^(t^)  - 

2  .  ^ 

^  T  ”  °  2l  [x.  (t,)  -  X  (t^)  I  =0  cuid  »  otherwise. 

A  A  Ilf  e  t 

Such  a  performeuice  index,  including  a  summation  of  the  pursuer's 
achievements  does  not  favor  as  much  cooperation  between  teammates 
as  a  performance  involving  a  minimum  operator  on  the  miss  distances 
would,  as  discussed  in  Chapter  VIII. 


A  reduced  state  vector  is  defined,  for  each  pursuer,  as 


where  euad  are  the  state  transition  matrices  corresponding  to 
(125)  and  (126).  y^(t)  is  the  terminal  miss  A(x^(t^)  - 

dieted  at  time  t,  on  the  basis  that  no  control  will  be  applied  during 
the  interval  [t,t^].  Consequently,  a  new  set  of  state  differential 
equations  is  defined  by 


y.  (t)  =  G  (t  ,t)u  (t)  -  G  (t  ,t)v(t)  , 

1  1  r  1  e  t 

^i^V  =  ^io  '  i  2 . N  . 

where  G^  ^md  G  are  time-varying  matrices  satisfying 

Gi(tf,t)  =>  A<l>^(t^,t)G^(t)  , 

G^(t,,t)  =  A$  (t,,t)G^(t)  , 

e  t  e  f  e 

and  the  p  .yraance  index  is  restated  as 


(129) 


(130) 


1  2 
2 


a 


i  1 


(131) 


-  I  |v(t)  I  )  dt  . 

e 

Then,  applying  the  classical  calculus  of  variations,  and  without 
lumping  the  controls  of  the  pursuers  into  a  single  control  vector, 
the  open-loop  optimal  controls  can  be  shown  to  be 


u*(t) 


a^R“^(t)G^(t^,t)y^(t^) 


v*(t) 


a^R  ^  (t)Gj(t,,  t)  (ly.  (t-) )  . 


(132) 
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The  optimal  controls  are  a  function  of  the  terminal  misses,  multiplied 
by  a  gain  which  varies  according  to  a  position  estimate.  The  form  of 
the  control  of  the  pursuers  is  irrelevant  to  the  number  of  the  team¬ 
mates,  enhancing  the  relative  independence  of  each  pursuer;  on  the 
other  hand,  the  evader  must  con^sound  the  various  threats  represented 
by  the  pursuers. 

To  derive  the  feedback  solutions,  two  approaches  eu:e  possible. 

The  first  one  defines  the  controls,  for  a  =  1,  as 


*  „-l^T„-l 

u.  =  -  R.  G.K.  y.  , 
1  1111 


*  -IT  -1 

V  =  -  R  G  (ZK.  y.)  , 
e  e  .  1  1 

1 


where  matrix  obeys 


(133) 


K^(t^,t)K"^(t^,t)y^(t)  =  -G^{t^,t)R"^(t)Gj(t^,t)K‘^(t^,t)y^(t) 


+  G^(t^,t)R“^(t)Gg(t^,t)  E  (K"^(t  ,t)y.(t))  ,  (134) 

j  ^  ^ 


Ki(tf,tf)  =  I 


I  is  the  identity  matrix.  (134)  can  immediately  be  simplified  for  a 
two-player  game,  i  »  j  =  1,  but  for  the  teeun  game  under  study,  sim¬ 
plification  of  (134)  requires  the  introduction  of  a  strategic  matrix 
Z^(t)  according  to  Chapter  V,  defined  as  a  symmetric,  non-singular 
matrix  which  satisfies 


(t  ,  t)y  (t) 


Z  (t)K"^(t,,t)y  (t) 


(135) 
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Then  the  propagation  of  simplifies  into 


Ki(tf,t)  =  -  G^(t^,t)R]^^(t)G^(t^.,t) 


(136) 


+  G^(t-,t)R"^(t)G'’^(t-t)  az.  (t))z7^(t) 

6  r  e  e  r  .  j  i 

3 


By  differentiation  of  (135),  2^ (t)  can  be  shown  to  be  null — a  direct 
consequence  of  the  reduction  chosen  in  (128).  Then  (t)  is  a  con¬ 
stant  matrix,  which  might  be  a  bit  sxirprising,  but  is  true  only  for 
the  game-optimal  strategies. 

This  approach  to  the  solution  is  used  by  Behn  and  Ho  [26]  for 
the  simpler  two-player  case;  its  advantage  is  to  lend  itself  to  nice 
controllability  interpretations.  The  major  drawback  is  the  inversion 
of  required  to  compute  the  controls.  A  better  conroutational  form 
was  adopted  in  Chapter  V,  where 


* 

u. 

1 


T 

G.  T.  y. 
1  i-'i 


(137) 


*  —IT 

V  =  -  R  G  (ZT.y. ) 

e  e  .  11 


1 

and  T^  propagates  according  to  Riccati  equations  that  are  readily 
simplified,  using  the  same  strategic  matrix  approach,  into 


T.  (t)  =  T.  (t)  IG.  (t,,t)R7^(t)G’?’(t.,t) 

1  1  1  f  1  If 

(138) 

-G  (t,,t)R"^(t)G'^(t-,t)  (Ez.  (t))  z7^(t)]T.(t) 
efe  ef  .i  i  i 

D 

T.  (t.)  =  a^I  . 

1  f 
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This  form  is  computationally  superior,  and  a  strict  derivation  of 
the  simpler  two-player,  linear,  quadratic  games  of  this  kind  can  be 
found  in  Ichikawa  [24] . 

Using  Z^,  the  optimal  control  of  the  evader  can  be  restated  as 

v*(t)  =  -a^R“^(t)G3(t-,t)  (EZ.  (t))K  (t.,t)y  ft)  .  (139) 

e  e  r  .  r  If  1 
1 

The  computation  of  the  optimal  controls  requires  that  the  N 
matrices  be  inverted.  According  to  HO,  Bryson  and  Baron  [27], 
the  non-singulau:ity  of  is  equivalent  to  the  non-existence  of  a 
conjugate  point,  in  the  one-vs.-one  case.  Existence  of  is  studied 
in  terms  of  the  team  controlleibility. 


3.  CONTROLLABILITY  STUDY  OF  THE  TEAM  DIFFERENTIAL  GAME 


From  (136) ,  the  N  matrices  are  integrated,  from  the  terminal 
time ,  as 


Ki(tf,t)  =  I/a^  +  /^^G^(t^,T)R“^(T)G^(t^,T)dT 


-  E  (t,,T)R“^(T)G’^(t-,T)Z,  (T)z7^(T)dT  . 

ter  e  er  i  i 

3 

Matrices  Z^  (t)  being  constcint,  a  more  compact  notation  is 

K.  (t,,t)  =  I/a^  +  M  .  (t,,t)  -  M  (t,,t)  (  E  Z.)Z."^  ,  (141) 

if  ri  f  e  f  .  1  1 

3/1 

where 


(142) 


M  (t,,t)  =M.  (t  ,t)  -  M  (t  ,t)  , 

IT  X  jT  XX  6  X 

M^(tf,t)  =  (t^,T)R~^(T)G^(t^,T)dT  , 

Mg(t^,t)  =  /^^Gg(t^,T)R~^(T)G^(t^,T)dT  . 

From  [26],  M.  and  M  are  the  reduced  controllability  grammians 
1  e 

of  the  pursuer  and  of  the  evader,  and,  for  the  l-vs.-l  game,  where 
faces  E,  is  called  the  relative  controllability  grammian  of 
pursuer  P^.  It  expresses  the  control  superiority  of  pursuer  P^  over 
the  evader,  in  the  l-vs.-l  game. 

A  sufficient  condition  to  ensure  existence  of  matrices  is 
the  semi -positive  definiteness  of  the  N  matrices 


where , 


M  .  (t-,t)  -  M  (t-, 
ri  f  e  f 


for  the  l-vs.-l  case. 


t)  (  I  Z.)Z:^  >  0  , 
a  sufficient  condition  is 


(143) 


M  .  (t.,t)  >  0 
ri  f  — 


(144) 


If  all  the  pursuers  are  playing  a  "positive”  role  in  the  game,  pur¬ 
suer  P^  must  be  better  off  as  a  team  member  than  playing  alone. 
Therefore  a  sufficient  condition  for  pursuer  P^  to  be  efficient  is 
that 


-  M^(t^,t) 


(  E  z.)z7^  ^  0 


(145) 


holds.  Since  is  positive  definite,  another  sufficient  condition 
is 


(  E  Z.)z7^  <  0  . 


(146) 


Both  conditions  provide  criteria  to  select  the  relevant  pvirsuers  to 
form  a  team. 

(143)  can  also  be  expressed  as 

M.  (t.,t)  -  M  (t-,t)  (Zz.)z7^  >  0  .  (147) 

It  e  t  .  i  1  — 

3 

Then  a  possible  strategy  for  P^,  to  which  corresponds  a  strategic 
matrix  Z^,  is  to  try  and  mazimize  (147),  in  order  to  be  as  "efficient" 
as  possible.  This  strategy  is  named  the  maximum  controllability 
strategy  since  it  is  related  to  the  relative  controllability  grammians 
If  all  the  pursuers  play  a  maximum  controlleLbility  strategy,  then 
the  sum  of  the  terms  as  (147),  i.e. 

^  -1  2 

Z  ((M.(t.,t)  -M  (t,,t)  (ZZ.)Z.  )  )Z.  ,  (148) 

i^l  r  f  e  f  .31  1 

2 

is  also  maximized,  since  Z^  is  a  non-singular,  positive-definite 
matrix.  (148)  can  be  reorganized  as 

ZM,(t,,t)Z^  -M  (t,,t) (ZZ.) (Zz. )  .  (149) 

If  1  e  f  .  j  .  1 
1  3-^1 

(149)  is  named  the  team  controllability  grammieui.  Due  to  the  symmetry 
of  the  strategic  matrices,  the  product 

(ZZ.) (ZZ, )  =  (Zz. )^  ,  (150) 

•  3.1  .1 

3  1  1 

2 

is  positive  definite,  auid  so  are  M. ,  M  and  Z. .  Then,  in  order  to 

1  e  1 

play  a  maximum  controllability  strategy,  (150)  must  be  maximized  and 

(Zz.)  =  0  must  hold  or,  for  all  i  =  1, — ,  N,  the  pursuers  must  play 
i 

a  strategy  such  that 


From  (151)  and  (139) ,  the  optimal  control  of  the  evader  corresponding 
to  the  maximum  controllability  strategy  of  the  team,  is  identically 
null.  If  the  game  is  to  be  played,  the  best  strategy  of  the  evader 
is  to  do . . . nothingl  The  evader  is  denied  any  incentive  to  move  and 
is  said  to  be  neutralized  by  this  (possibly  non-game  optimal) 
strategy  played  by  the  team. 

4.  MOST  CONTROLLABLE  DISPOSITION  OF  PURSUIT  TEAM 


The  most  controllable  disposition  of  a  pursuit  team  is  such  that 
the  maximum  controllability  strategy  is  also  the  game-optimal  strategy 
for  the  teaim  of  pursuers.  And,  as  shown,  the  evader  is  neutralized 
by  that  disposition.  Then  the  team  is  said  to  be  optimal. 

If  the  matrices  defined  in  (145)  are  positive  definite,  then 
multiplying  each  end  by  the  vector  K~^ (t^, t) y^ (t)  and  using  defi¬ 
nition  (135) ,  yields  the  scalar  inequality 


-yy(t)K“'^(t  ,t)M  (t  ,t)  Z  K~^(t  ,t)y.(t)  ^0 

37^1 


(152) 


and,  as  proportional  to 


C(t.)  =  AG  (t^)R~^(t^)G^(t,,)A'^ 
f  e  f  e  f  e  f 


(153) 


Then,  since  K^(t^,t)  =  I/a  at  t  =  t^,  (153)  becomes 


ay  (t)C(t)  !;y.(t  )  ^0  , 
r  .^i  j  r 


(154) 


where  a  is  a  oositive  constant. 


In  order  for  pursuer  to  play  a  maximum  controllability 


strategy,  and  for  a  given  miss  distance,  y. (t  )  must  minimize  the 


vector  product  (154) ;  it  expresses  the  fact  that  an  efficient  team¬ 
mate  must,  at  the  terminal  time,  face  the  vectorial  sum  of  the 
miss  vectors  of  the  remaining  pursuers,  modified  by  the  constant 
matrix  C(t^),  expressing  the  evader's  interest  and  capabilities,  to 
cut  the  evader's  main  exit  direction. 

If  a  pursuer  is  to  be  joined  to  the  team  (P J ,  i  /  j,  to 
provide  a  maximum  controllaibility  strategy  over  the  evader,  then 
(154)  must  be  minimized.  Or,  in  order  for  a  team  of  pursuers  to  be 
optimal  facing  an  evader,  then  the  set  of  equations  (154)  must  be 
jointly  minimized.  (154)  provides  N  -  1  independent  terminal  time 
equations  for  N  pursuers,  greatly  simplifying  the  formidable  task 
of  finding  the  optimal  controls  of  this  (N  +  1)  point,  boundary- 
value  problem. 

One  important  consequence  for  two-identical  pursuer  games  is 

that  (151)  becomes  =  -Z^  =  1,  because  of  the  very  definition  of 

the  strategic  matrices  (135).  Then,  for  a  fixed  x, (t  ),  x. (t.)  is 

1  o  2  f 

chosen  as  to  minimize  (154) .  The  trajectories  corresponding  to  the 

two-player  game  (P^,E)  and  the  trajectory  of  P^  in  the  three-player 

game  (P, ,P_,E)  must  match  at  t  =  t  .  Due  to  the  symmetry  of  the 
12  o 

differential  game,  to  locate  "optimally"  P^  as  to  obtain  the  most 
controllable  of  the  pursuit  team,  the  two-pursuer  geune  does  not  need 
to  be  studied  at  all  i.e.,  the  one-pursuer  game  (P2,E)  equations  are 
merely  integrated  backwards  in  time  from  the  terminal  state  y^ ( t^) 
given  by  (154) ,  to  find,  at  t  =  t  ,  the  position  of  P  which  is 


optimal. 


The  above  procedure  applies,  in  fact,  whenever  N-1  pursuers 
are  already  located,  then  the  study  of  the  (N-1) -pursuer  game  gives 
the  solution.  Then  the  trajectories  and  controls  of  the  N-pursuer 
game  are  computed  frcan  a  single  one-sided,  control  problem,  since 
the  game-optimal  strategy  is  a  maximum  controllability  strategy  for 
the  optimal  team,  and,  as  shown,  it  implies  that  the  evader  is 

It 

neutralized,  i.e.,  v  =0  and  N-1  independent  terminal-time  relations 
minimizing  (154)  are  known  to  hold. 

5.  EXAMPLE 

A  second-order,  two-pursuer  game  is  studied,  in  which  both 
pursuers  are  identical,  with  A  =  I,  =  I,  =  I,  ^  If 

t^  =  0.75  and 

-0.1  0.1 

F  *  F  =  .  (155) 

0.1  -0.2 

Figure  25  shows  the  real  trajectories,  with  x  (t  )=  (0,  0).  The 

e  o 

line  is  traced  which  corresponds  to  every  possible  position  x^ (t^) 
for  a  fixed  x, (t  )  and  such  that  the  terminal  optimal  condition  (154) 
is  minimized.  This  line  of  perfect  help  passes  exactly  through  the 
point  -x^{t^),  as  expected.  This  line  is  limited  because  the  miss 
distances  oi  and  P^  must  be  positive.  Considering  * 

pcurameter,  the  line  of  perfect  help  can  be  thought  of  as  separating 
the  area  in  which  P^  strives  to  decrease  the  terminal  miss  from  the 
area  in  which  t^  is  large  enough  to  allow  to  primarily  think  of 
reducing  its  energy  spending. 


Figure  25.  Most  controllable  two-pursuer  team. 

A  more  painstcOclng  study/  to  compute  the  lines  of  constant 
terminal  payoff,  is  required  to  use  the  l-vs.-l  trajectory  to  locate 
optimally  pursuer  P^,  constrained  to  a  zone  at  t  =  t^,  in  order  to 
minimize  the  terminal  payoff.  This  task  is  more  feasible  concern¬ 
ing  minimum  time  problems. 


VIII.  STRUCTURAL  CHOICES  OF  A  STOCHASTIC 
DIFFERENTIAL  TEAM  GAME 

1.  INTRODUCTION 

Stxschastic  differential  games  in  which  both  peurties  meUce  noisy 
measurements  of  the  state  have  a  closure  problem  due  to  the  fact  that 
an  optimal  control  should  take  advantage  of  the  estimate  of  the  error 
made  by  the  opponent  in  its  own  estimate,  but  this  estimate,  in  tom, 
is  not  exact  and  must  be  estimated  by  the  former  player,  etc. 

On  the  other  hand,  games  in  which  only  one  party  makes  perfect 
measurements  do  not  have  such  problems.  Moreover,  the  separation 
principle,  dividing  the  stochastic  game  into  an  estimation  problem 
followed  by  a  game  analysis,  can  easily  be  studied.  In  the  problem 
studied  below,  the  evader  mcdces  perfect  measurements. 

The  team  stochastic  differential  game  arises  between  two  sonaurs 
(or  radar)  systems  and  a  target.  The  target  uses  a  mixed  strategy 
by  adding  white  noise  into  its  controller  to  hamper  the  tracking  of 
the  sonars.  The  study,  relying  on  the  classical  calculus  of  vari¬ 
ations,  provides  insight  into  the  solutions  of  difficult  problems 
specific  to  team  games.  In  particular,  vaurious  cooperative  modes 
between  the  two  sonar  systems  are  shown  to  produce  different  results. 
The  game  is  studied  according  to  a  centralized  or  a  decentralized 
orgamization  which  aure  equivalent  for  passive  taurgets  that  do  not 
use  their  white  noise  control  capabilities.  Conditions  are  derived 
under  which  the  Nash-optimal  choices  of  a  structure  are  demonstrated. 


To  illustrate  further  team-game  problems/  a  non-optimal,  generated, 
scalar  case  is  presented  in  which  the  decentralized  structure  is 
proven  superior,  cuid  where  singularities  are  studied.  Due  to  the 
open-loop  nature  of  the  problem,  a  Stackelberg  (hierarchical)  gaune 
can  be  defined  in  the  same  context. 

2.  GAME  STATEMENT 

Two  ships,  equipped  with  sonar  (or  radar)  systems  are  tracJcing 
a  target  whose  dynamics  are  assumed  to  have  the  linear  form 

X  =  F(t)x  +  G(t)v  ,  (156) 

where  F(t)  and  G(t)  are  known  matrices  of  dimension  n.n  and  n.p;  x  is 
am  n-vector  of  state  vauriaJales,  amd  v  is  a  p-control  vector. 

The  action  of  the  ships  as  a  team  is  not  the  a  posteriori  result 
of  a  coalition  dictated  by  a  common  interest  but  an  a  priori  assump¬ 
tion  whereby  the  overall  performance  of  the  team  supersedes  the 
individual  payoffs. 

The  target,  detected  at  time  t  ,  is  tracked  up  to  time  t_.  In 
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the  mean  time,  it  uses  its  control  capabilities  to  achieve  both 
good  acctiracy  in  reaching  its  destination  and  a  maximum  in  the 
tracking  error  estimate  made  by  the  ships. 

Ho  [28]  suggests  a  control  law  of  the  form  u=K(t)x+C-  K(t) 
is  a  p.n  feedback  matrix  emd  C  a  p-vector  of  Gaussian  white  noise 
con^Jonents  with  statistical  parameters  E(C(t))  =0  and 
E(^(t)C(i))  =  T(t)6(t-T),  where  T  (t)  is  a  p.p  matrix  used  by  the 
target  as  a  control  variable.  Through  the  control  of  the  statistical 


parameters  in  the  matrix  T,  the  target  is  2d)le  to  play  a  mixed 
strategy  to  con5>licate  the  estimation  algorithm  of  the  tracking 
team,  but  at  the  expense  of  self-inflicted  perturbations. 

The  ships  simultaneously,  but  on  an  independent  basis,  record 
continuous  noisy  measurements  of  the  state  x  of  the  target,  in  the 
following  form: 

=  H^(t)x  +  s^,  i  =  1,2.  (157) 

is  a  known  q.n  matrix,  and  s^  is  a  q-vector  containing  Gaussian 

white  noise  components  such  that  E(s^)  =  o  and  E (s^ (t) s^ (T) )=S^ (t) 6 (t-x) 

where  S^(t)  is  given.  For  convenience,  s^  and  s^  are  assumed  to  be 

T 

statistically  independent,  i.e.,  ~  E(Sj^  (t)  s^ (x) )  =  0. 

From  these  two  measurements,  and  knowing  the  target  dynamics 

(156) ,  the  tracking  team  is  able  to  design  a  (Kalman)  filter  by 

optimizing  its  gain  according  to  the  performance  index.  If  T(t) 

were  known  by  the  team,  then  C  could  be  treated  as  a  mere  corrupting 

noise  of  given  characteristics,  to  be  averaged  out  by  the  filter. 

Unfortunately,  the  actual  characteristics  of  the  corrupting  noise  C 

are  unknown  to  the  pursuers  since  it  is  controlled  by  the  target. 

At  this  point,  two  approaches  eure  possible.  The  brute-force  method 

assumes  in  (156)  a  value  T  of  mciximum  corrupting-noise  covariance, 
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covering  the  worst  possible  case,  eind  performs  the  state  estimation 
on  that  basis.  This  method  has  the  obvious  advantage  of  great  sim¬ 
plicity  but  might  be  overwhelmingly  penalizing  if  the  target  decides 
not  to  use  a  mixed  control  at  all. 

Consequently,  the  second  method,  namely  the  gaming  approach,  is 
considered  here.  The  overall  problem  might  be  decomposed  into  two 


steps:  first,  the  game  is  solved  by  the  tracking  team  which  com¬ 
putes  the  game-optimal  controls  played  by  the  target.  Then,  on  line, 
when  the  target  is  actually  detected,  the  tracking  is  performed 
according  to  these  assuirptions .  Of  course,  the  actual  target  might 
not  adopt  the  controls  predicted  by  the  game  auialysis,  but  the 
tracking  team  is  guaranteed  a  minimum  payoff  by  playing  according  to 
the  geune-optimal  controls.  The  same  type  of  approach  is  used  by 
the  target  to  predict  the  optimal  filter  gains  on  which  to  base  its 
optimal  control  strategy. 

A  remark  of  importance  is  that  neither  the  tracking  team,  nor 
the  target  is  able  to  con^jute  the  exact  performance.  Actually,  the 
error  estimate  covarieuice  computed  on  line  by  the  tracking  team  cor¬ 
responds  to  the  truth  only  in  the  physically  improbable  event  of  eui 
optiDMd.  play  made  by  both  parties.  Throughout,  it  will  be  asstoned 
that  a  referee  has  access  to  both  sides  to  compute  the  real  perfor- 
mcince. 

Therefore,  the  problem  as  introduced  is  a  game  and  not  a 
classical  estimation  problem.  In  the  sequel,  the  optimality  of  the 
filters  does  not  refer  to  the  control  actually  played  by  the  target 
but  the  filters  are  to  be  understood  as  game-optimal  filters. 

Team  game  analysis,  as  opposed  to  the  l-vs.-l  game  problem, 
such  as  studied  by  Speyer  [23]  is  the  main  objective.  In  particular, 
the  various  communication  structures  that  define  cooperation  levels 
reflected  by  the  very  choice  of  the  performance  index,  together  with 
structural  choices  that  supersede  control  strategies,  are  examined. 


3.  COMMUNICATION  STRUCTURE 


Perfect  information  is  assumed;  in  particular,  the  form  of  the 
target  dynamics  and  control,  the  measurement  equations,  the  form  of 
the  estimator  filters  as  well  as  the  statistical  parameters  of  the 
various  random  Vcuriables  and  the  initial  states  are  known  by  both 
parties. 

Several  filtering  structures  to  estimate  the  state  of  the  target 
from  the  measurements  are  discussed  but,  throughout,  the  various  fil¬ 
ters  have  the  lineeu:  form 


X.  =  (F  +  GK)x.  +  L. (z.  -  z  )  ,  (158) 

1  1  1  r  1 

where  z.  =  H.x.  and  L.  is  the  filter  gain  (control)  to  be  optimized. 
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The  linear  form  of  the  equations  and  the  Gaussian  assumptions 
allow  a  restatement  of  the  game  in  terms  of  a  differential  equation 
governing  the  propagation  of  the  covariance  of  the  state  variables 
by 


X  =  (F  +  GK)X  +  X(F  +  GK)"^  +  GTc"^  , 
X  =  E (x(t) x^(T) )  . 


(159) 


The  target  makes  perfect  measurements  of  its  own  state;  the 
separation  principle  holds,  and  the  stochastic  game  is  reformulated 
as  a  deterministic  game  of  perfect  information,  belonging  to  the 
class  of  problems  investigated  by  Isaacs  [3]. 

Similarly,  a  linear  estimator  based  on  measurements  z^=H^(t)x+s^ 
has  eUi  error-covariance  matrix  propagating  as 


P.  =  (F+aC-L.H.)P  +  (F+GK-L.H.)  P  +  L.S.L.  +  GTG  ,  (160) 

i  XL  xi  1X1.  r  y  f 

P^  =  E((x  -  x^)  (x  -  x^)"^)  . 

The  cooperation  levels  between  the  members  of  the  team  are  re¬ 
lated  to  the  density  of  their  oommunication  network.  For  exeui5>le, 
when  the  two  ships  are  not  allowed,  or  do  not  possess  the  ability,  to 
improve  their  estimate  by  compeiring  their  results,  then  the  target 
strives  to  increase  the  uncertainty  of  the  best  performer  according 
to  the  performance  index 

J  =  tr{0.5  AA'^XCt  )  -  0.5  min(P  (t),P_(t))dt}  ,  (161) 

t  to  ^  2 
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wrth  AA  an  n.n  weighting  matrix  and  tr  the  trace  operator.  Though 
playing  independently,  it  can  be  shown  that  teammates  cannot  solve 
the  game  unless  they  can  compare  P^^  with  P^  to  compute  the  target 
control  that  is  used.  The  operator  "min"  can  be  eliminated  by 
defining  an  equivalent  3-vs.-l  game,  somewhat  as  suggested  by 
Gourishankar  and  Salama  [30],  as  shown  in  Section  III. 2. 

On  the  other  hand,  a  fairly  easy  generalization  of  a  result 
due  to  Speyer  [29]  Ccin  be  derived  by  defining  J  as 

J  =  tr{0.5  AA'^X(t-)  -  0.5  /^^ZP.{t)dt}  .  (162) 

£  toi  1 

In  that  case,  it  can  be  shown  that  no  communication  is  required 
between  teammates;  therefore,  the  structure  of  the  game  does  not 
possess  feedback  links.  Though  in  a  simpler  form,  this  performance 
index  is  ill  defined  since,  from  the  target's  point  of  view,  a 
large  P  does  not  compensate  for  a  small  P  . 


vnien  a  permanent  link  is  established  between  the  pursuers, 
then  cooperation  can  be  total.  The  tracking  team  comes  with  a 
unique  estimate  and  associated  covariance  P(t)  according  to  the  per¬ 
formance  index,  adopted  from  row  on: 


J  =  tr{0.5  AA'^X(t,)  -  0.5  /^^P(t)dt}  . 


Computing  directly  the  overall  optimal  tracking  strategy  will 
not  be  atten5)ted,  but  rather,  structures  are  defined  a  priori  in 
order  to  compare  the  classical  l-vs.-l  game  with  the  more  peculiar 
team  games.  Two  filtering  structures  are  developed  in  detail  in  the 
next  section.  In  the  first  one,  referred  to  henceforth  as  the  cen¬ 
tralized  structure,  the  two  input  measurements  are  merged  together 
using  a  zero-memory  filter  to  produce  a  unique  early  estimate;  this 
is  conceptually  equivalent  to  a  measurement  which  is  the  object  of 
the  study.  Therefore,  the  game  becomes  a  l-vs.-l  game  of  the 
classical  type,  such  as  studied  by  Speyer  [29]. 

In  the  second  case,  named  decentralized  structure,  each  team 
member  optimizes  its  very  own  filter  gain,  producing  an  estimate 
which  is,  in  a  later  stage,  combined  with  that  of  its  copursuer  to 
produce  the  overall  team  estimate.  This  2-vs.-l  game  structure  is 
compared  to  the  previous  one. 

The  estimation  counterpart  to  this  game  problem,  i.e.,  the 
brute-force  method  proposed  earlier,  is  also  interesting.  Both 
structures  can  be  compared  with  the  overall  optimal  linear  filter 


based  on  the  optimization  of  the  gains  L,  and  L_  as 


riTf 
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X  =  (F  +  GK)  X  +  L^(2^-z^)  +  ^^(z^-z^)  ,  (164) 

2.  =  H.x 
1  1 

On  the  other  hand,  the  decentralized  structure,  because  of  the  com¬ 
mon  information  it  is  based  i^son,  is  not  to  be  con^sared  with  a  decom¬ 
position  scheme,  such  as  arises  when  the  optimal  smoothing  solution 
for  linear  dynamic  systems  is  derived  as  the  combining  of  two  Kalitiein 

filters  working  on  intervals  (t  ,t)  and  (t  t  ) ,  for  non  overlapping 

o  ,  r 

measurement  subsets. 

4.  THE  TWO  STRUCTURES 

4.1  The  centralized  structure 

Figure  26  shows  a  block-diagram  representation  of  the  centralized 
structure.  The  two  measxirements  z^  and  z^  collected  from  the  sonar 
channels  cure  merged  in  a  zero-memory,  linear  filter  (LF)  of  the  form 

z  =  z^  +  CCz^  -  z^)  .  (165) 

C  is  to  be  computed  as  to  minimize  the  mean  square  error: 

S  =  E(  (z-z)  (z-z)"^)  ,  (166) 

where  z=Hx.  For  H^=H,  S  ceui  be  computed  as 

S  .  Sj  .  -  S^c''  *  CSj2  -  CS^  +  C(S^  -  -  2S^j)C^  , 

(167) 


or  alternatively  as 


m 


Figure  26.  Block  diagram  of  the  centralized  game  structure 
is  the  target  optimal  controller. 


S  =  Si  -  Mi(Si-Si2)  +  (C-Mi) (S1+S2-2S12) (C-Mi) "  , 


Now,  since  Mi  and  (S1+S2-2S12)  are  positive  definite,  (168)  is 
minimized  for  C  =  Mi,  «md  then,  using  (157),  (165)  Ccin  be  expressed 


z  =  Hx  +  (^iS^  +  M^Si) 


An  equivalent  noise  s  is  defined  as 


s  =  MiS^  +  M^Si  .  (170 

It  can  be  verified  that  E(s)  =  0  and  the  covaricince  of  s,  i.e., 

S  =  E(s(t)s'^(T) )  ,  is 


since  Si^  =  0  is  assumed.  Thus,  the  result  of  the  merging  of  the 
tvD  measurements  is  conceptually  equivalent  to  a  single  measurement 
vector  z,  corrupted  by  the  Gaussicui  white  noise  s.  By  using  the 
zero-memory  linear  filter,  the  problem  is  reduced  to  a  simple 
l-vs.-l  game.  Thus  l-vs.-l  game  is  then  solved  using  calculus  of 
variations.  The  performcince  index  and  the  variational  Hamiltonian 
are  defined  as 


J(K,  ,TJ  =  tr{x,  (t^)  -  0.5A^P(t)dt}  , 


(172 


(173) 


H  =  tr{-0.5  P  +  A  ( (F+GK J X  +X, (F+GK, )  +  GT  G  ) 

X  111  1  1 


+  A  (  (F+GK  -I':)P+P(F+GK  -LH)'^  +  LSl'^  +  GT,  g'^)  } 
pi  1  1 


A^  and  A^  are  the  Lagrainge-multiplier  matrices  associated  with  the 


covariances  X  and  P.  is  the  target  variance  control  and  P  is 


the  estimator  error  variance.  K  is  the  feedback-control  matrix. 
The  costate  vciriables  propagate  as 


=  -  A^(F+GK^)  -  (F+GKj^)' 


■A 


A^(t^)  =  I  , 


A  =  0.5  I  -  A  (F+GK, )  -  (F+GK, )  A  , 
p  pi  1  P 


(174) 


V'>^£>  -  »  • 


The  necessary  conditions  of  optimality  yield  the  optimal  con¬ 
trol  gain  L  as 


T 

~  =  (-2HP  +  2SL  )  A  =  0  , 

oL  p 


(175) 


and  since  it  can  be  checked  from  (174)  that  A^  is  non-singular  except 


at  t  =  t^,  the  optimal  control  L  for  this  l-vs.-l  case  is 


T  -1 

L  =  PH  S 


(176) 


Substituted  back  into  (160) ,  this  yields  the  familiar  Kalman-filter 
formulation, 

P  =  (F+GK^)P  +  P(F+GK^)^  Ph'^s“^HP  +  GT^G^  ,  (177) 


optimized  according  to  the  given  cov^Jle  (K  , T  )  . 


The  target  control 
and  the  feedback  matrix 
The  optimal  control  law 


takes  values  in  the  interval  [0,T  ]  , 

max 

is  boun<ted  by  the  matrices  K  .  and  K 

min  max 

is  computed  as 


3H 


3k,  .  . 

ii: 


{ (2XA  +  2PA  )g} . . 
X  P 


>  0 


<  0 


K 


lij 


K 


lij 


K.  .  . 

1  jmin 


K.  . 

1  jmax 


(178) 


cuid  T^  is 


=  {g'^(A  +  a  )g}.. 
3t,  .  .  X  D  11 
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>  0  -►  T,  .  .  =  0 
1-J 


<  0  -►  T  =  T 

lij  i3max 


(179) 


The  above  generalizes  the  results  obtained  in  [29]  where,  for 
the  scalar  case,  it  is  proven  that  no  singular  arc  can  occur  in  (178) 
However,  singularities  play  a  major  role  for  T^  in  the  derivation  of 
the  solution  of  the  l-vs.-l  game.  This  point  is  illustrated  by  an 
exan^jle  in  [29]  which  is  unfortunately  in  error. 


4.2  The  decentralized  structure 


In  the  2-VS.-1  tecim  game  depicted  in  Figure  27,  each  ship  com¬ 
putes  its  own  estimate  x^  euid  associated  error  covariance  which 
are  then  combined  by  the  same  type  of  zero  memory  linear  filter  as 
in  the  previous  structure.  Each  ship  is  given  more  processing  power 
but  the  task  of  the  higher  hierarchical  level  is  a  lot  simpler  than 
for  the  centralized  structure. 

The  processing  algorithms  PI  and  P2  associated  with  the  ships 


include  the  sonar  channel,  a  Kalmcin  filter  and  a  model  of  the  target 
dynamics.  Feedback,  under  the  form  of  the  output  covariance  Q  is 


required  by  the  individual  ships  to  confute  their  optimal  cxintrols. 


Elaborated  information  is  passed  on  to  the  higher  level  whereas, 
previously,  raw  measurements  are  communicated. 

According  to  section  4.1,  the  output  covariance  is 


Q  =  0.5  (P^M^  +  P^M^  +  , 


(180) 


where  =  (P^  -  P^^^^  (^1  ^2  "  ^^12^  • 

—  T 

Pj^2  cross  covariance  defined  as  P^^^  “  E ( (x-ST^)  (x-x^)  )  and  can 

be  shown  to  propagate  as 


^2  =  ^^^2^  ' 


where 


P._(t  )  =  E(x(t  ) -X  (t  ) ) {x{t  ) -X  (t  ) )  ) 
12  O  OlO  020 


=  E(s  (t  )s  (t  ) )  =  0  . 
1  O  2  O 


Thus,  it  can  be  seen  that,  through  the  use  of  the  mixed  strategy  7^, 
the  target  is  cd)le  to  control  directly  the  amount  of  cross  covariance 
or  redundancy  in  the  confutation  performed  by  the  teammates. 

Then,  the  game  is  solved  using  the  calculus  of  variations 
approach.  The  performance  index  is 


I(K2,T2)  =  trCx^Ctj)  -  0.5  0(t)dt}  , 


where  Q  is  given  by  (181).  The  state  equations  are 


=  {F+GK2)X2+X2(F+GK2r  +  GT2G*  , 


P.  =  (F+GK^)P.+P.  (F+GK-)*^  -  L.H.P.  -  P.h'^lT  +  L.S.l'^  +  GT^G 

1  2ii  2  111  111  111  2 


=  • 


(186) 


The  Hamiltonian  is 


H  =  tr{-0.5  Q+A  {(F+GK_)X^+X_(F+GK^)'^+GT^g'’^)+A  ,  ^GT.g'^ 
X  2  2  2  2  2  pl2  2 


'  (P+«2)  X2«2  (F+aCjl  ’'-Wr 


+A^2  (  (F-fGKj)  (F+GK^)  ’'-LjHjP^-FjH^L^^L^SjL^-WTjGA  )  , 


(187) 


where  the  Lagrange  matrices  A  ,  A  , ,  A  ^  and  A  , ^  obey 

X  pi  p2  pl2  ^ 


^12  =  M^M^);  =  0  , 


pl2 


A  =  0.5  (M,My+M-My+M,M^)-A  ,  (F+GK^)  -  (F+GK^) '^A  ,  (188) 

pi  11  2  1  12  pi  2  2  pi 


Ap2  =  0.5  (M^M^+M^M^M+M^mJ)  -  Ap2(F+GK2)-(F+GK2)'^Ap2 


and 


M. 


(P,-P.„)  (P  +P„-2P_) 


(182) 


119 


The  necessary  conditions  of  optimality  yield  the  optimal  controls 
and  as 


I 


9h 

8l. 


r 


(-2H.P.  +  2S.L.)A  .  = 

11  1  1  pi 


(189) 


or 


L.  =  P.h'?^s“^  .  (190) 

1  111 


The  target  controls  are  given  by 


and 


>  0  -►  K^.  .  =  K.  .  . 

=  {(2X,A  +2P  A  -+2P  A  ,)g}.. 

3k_. .  2  X  1  pi  2  p2  ]! 


2ij 


<  0  K  .  =  K.  . 

2i:  i3max 


3h 


3T 


{g  (A  +A_,+A  ^+A_,-)g}. 


>  0  ^  T^, .  =  0  , 


2ij 


X  pi  p2  pl2'''^  ji 


(192) 


<  0  -*■  T^.  .  =  T.  . 

2ij  ijmax 


4.3  Comparison  of  the  structures 

In  order  to  ma)ce  a  decision  concerning  the  choice  of  structure, 
is  to  be  compared  with  J(Kj^,T^).  A  possible  way  involves 
extensive  simulation  for  the  very  example  under  study.  Here,  an 
analytical  method  aimed  at  deriving  sufficient  conditions  is  chosen 
instead.  For  all  (K,T),  the  Nash  optimal  inequalities, 


1(^2, T^)  i  I(K,T)  , 


(193) 


J(K, ,T, )  <  J(K,T) 


hold  since  the  target,  controlling  K  and  T,  is  the  minimizing 
player;  in  particular 


1(^2, T2)  <  I(K^,T^)  , 
J(Kj^,Tj^)  <  J(K2,T2)  . 


(194) 


and  the  above  problem  is  solved  if  I(K^,T^)  and  J(K^,T^)  or  I(K2,T2) 
and  J(K2,T2)  can  be  compared.  The  foxir  payoffs  of  (194)  can  be 
rearrcuxged  according  to  the  matrix  form  in  Table  4.  If  I(K^,T^)  £ 
J(Ki,Ti),  then,  considering  (194),  the  above  matrix  game  admits  a 
Nash  equalibrium  in  pure  strategies  corresponding  to  the  centralized 
choice  of  structure.  On  discrete  games,  aunong  others.  Luce  amd 
Raiffa  (31]  cam  be  referred  to.  Conversely,  if  J{K2,T2)  £I(K2,T2), 
then  the  decentralized  choice  made  by  both  parties  is  the  Nash 
optimal . 


Table  4.  Payoff  matrix. 


Tracking  structure; 


centralized 

decentralized 

Structure  of  the 

(minimizing) 

target. 

centralized 

J(K^,T^) 

KKj.T^) 

decentralized 

J(K2,T2) 

I(K2,T2) 

Consequently,  by  studying  the  perfect  information  game  in  which 
both  players  are  bound  to  jointly  choose  either  structure,  and  using 
the  adxjve  approach,  the  solution  of  the  non-perfect  information  game 
in  which  neither  player  is  sure  of  the  other's  choice  of  structure 


is  also  solved. 
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I{Ki,Ti)  is  the  performance  achieved  when  the  target  plays  the 
strategies  K^,  optimal  if  the  tracking  team  adopts  the  centralized 
structure  when,  actually,  the  tracking  team  adopts  the  decentralized 
structure,  based  on  the  assumption  that  the  target  plays  accordingly. 

Therefore,  the  tracking  filters  are  not  fitted  to  the  real  control 

* 

policy  K^,  T^.  Let  I  (K, T)  be  the  payoff  of  the  filter  optimized  with 
respect  to  the  very  pair  (K,T).  Then,  due  to  optimality, 

1(K,T)  <  I*(K,T)  (195) 

holds  for  the  maximizing  tracking  team.  And,  in  particular, 

I(K^,T^)  <_I*(Kj^,T^)  ,  (196) 

which,  together  with  (194)  yields 

I(K2,T2)  .  (197) 

^^2' "^2^  is  the  control  pair  assiimed  by  the  tracking  team  to  optimize 

I .  Therefore , 

I*(K2,T2)  =  1(K2''^2^ 

and,  eventually 

1*  <  I*(Kj^,T^)  .  (199) 

Similar  demonstration  performed  on  J(K2,T2)  gives 

J*(K^,T^)  1J*(K2,T2)  .  (200) 

(199)  and  (200)  form  a  set  equivalent  to  (194) .  Therefore, 


L22 


(201) 

J*  ±1*  ^  J(K^,T^)  <HK^,T2^  . 

*  * 

Though  more  restrictive  in  the  conclusioits,  working  with  I  and  J 
has  the  advantage  that  the  performance  computed  by  the  tracking 
team  and  the  real  performance  do  coincide,  thereby  simplifying  the 
computation  task  attempted  in  the  next  section. 

As  pointed  out  previously,  neither  the  target  nor  the  tracking 
team  has  access  to  the  real  value  of  I(K^,Tj^)  or  JCK^,!^)  during  the 
actual  tracking  phase;  only  a  referee,  having  free  access  to  both 
sides  could  compute  these  values.  The  performances  computed  by 
either  side  differ  since  they  are  based  upon  different  assumptions. 

If  a  decision  had  to  be  taken  according  to  these  quantities,  it  would 
define  a  non-zero  sum  game  which  translates  into  a  bimatrix  game. 

This  approach,  a  little  more  involved,  though  more  satisfying  for 
the  players,  is  the  one  adopted  in  the  example  of  Section  7. 

5,  COMPARING  I(Kj^,Tj^)  WITH  J(Kj^,T^) 

The  target  plays  according  to  the  centralized  structure,  cor¬ 
responding  to  equations  (159) ,  (160) ,  (172)  to  (179) .  It  thinks  it 
achieves  a  payoff  computed  on  the  basis  of  P(t). 

The  tracking  team,  on  the  other  hand,  plays  according  to  the 

decentralized  structure,  described  by  equations  (180)  to  (191), 

T  -1  T  -1 

adopting  the  control  policy  =  '^l^l^l  ^2  ~  ^2^2^2 

in  (190),  and  assuming  an  achievement  KK^jT^)  computed  after  X2(t^) 

and  Q{t)  as  in  (185) . 


Nevertheless,  the  true  performance  achieved  is 


I(K^,T^)  =  tr{x(t^)  -  0.5  /^(t)dt}  , 
where  the  over  bars  denote  the  true  values,  such  that 
X  =  (F+GK^)X  +  X(F+GKj^)'^  +  GT^g’’  , 

Q  =  0.5  (P^M^  +  ' 


(202) 


(203) 


where  P^  and  P2  propagate  as 

P^  =  (F+GK^)P^  +  P^(F+GKj^)'^  - 


+  p^h^s^h^p^  +  gt^^g"^  , 


(204) 


Pj  -  (PWKjjPj  *  P2(P*GK2)’’  -  PjHJs'^H^Pj  - 


*  *  gt^g’ 


T 

GTj^G 


The  initial  conditions  are  identical.  Computing  I(K^,Tj^)  amounts  to 

solving  in  pareillel  three  differential  games,  some  equations  being 

* 

interdependent  as  (204)  shows.  When  I  (K^,T^)  is  studied,  the  per¬ 
formance  coitputed  by  the  tracking  team  is  the  real  performance; 
therefore  equations  (202)  to  (204)  are  irrelevant,  resulting  in  con¬ 
siderable  simplifications. 


I(Kj^,T^)  is  to  be  conpared  with  J(Kj^,Tj^)  which  actually  cor¬ 
responds  to  the  performance  assumed  by  the  tcurget  and  expressed  by 
(172).  Simulation  of  the  above,  for  even  a  two  dimensional  case, 
amounts  to  68  scalar  differentiad.  equations,  16  switching  functions 
and  24  paurameters  from  trial  aind  error  (the  2-vs.-l  game  is  a  4-point 
boundary-value  problem] -  This  task  must  be  somewhat  duplicated  to 
compute  J(K2,T2).  The  conplexity  is  a  characteristic  of  team  games. 

Since  X(t  )  =  X(t  ),  (203)  and  (159)  are  identical  propagation 
o  o 

functions,  therefore,  X(t^)  =  X(t^) .  In  other  words,  since,  in  both 

cases,  the  target  plays  the  same  strategy,  it  alters  its  strategy  the 

same  way.  Thus,  the  two  integral  terms  in  (202)  auid  (172)  are  to  be 

oon^ared.  As  a  sufficient  condition,  their  differential  elements 

Q(t)  and  P(t)  are  coitpared.  If  Q(t)  -  P(t)  is  positive  definite  for 

all  t  in  the  interval  considered  (i.e.,  Q(t)  ^P(t)),  then 

I(Kj^,Tj^)  £  J(Kj^,T^),  and  the  centralized  structure  is  to  be  chosen. 

If  Q(t)-P(t)  is  negative  definite,  then  the  decentralized  structure 

is  best.  Since  Q(t  )  =  P(t  ),  the  study  focuses  on  P  and  Q. 
o  o 

Next,  the  identities 


—  —  -T  — T 

=  I  , 


(205) 


<‘’i  -  ‘”12'^  ■  *  ''2  -  *  "i"’i  -  ^2’  ■ 


are  recognized,  and  AP^  "  introduced  as  the  difference 

between  the  covariamce  computed  by  the  trackers  and  the  real  one. 


Then,  after  some  cumbersone  algebra. 


(206) 


Two  remarks  are  in  order  at  this  point.  First,  the  difference  be¬ 
tween  both  equations  is  seen  to  come  from  the  feedback  control  and 
from  the  quadratic  term.  Then,  when  the  target  decides  not  to  use 
its  mixed  strategy,  P^2  i^einains  zero  and  Q  is  identical  to  P;  thus, 
both  structures  perform  equally. 

riatrices  M. ,  P  are  positive  definite,  therefore,  if  B  and  B 
X  X  ^  X  M 

are  negative  definite  then  Q(t)  ^  P(t)  is  ensured  and  the  centralized 
structure  chosen. 

Except  for  the  term  (F+GK^) ,  both  B^  and  B^  are  negative 

definite  terms;  thus,  in  order  for  the  centralized  structure  to  be 

chosen,  F+GK  must  not  be  so  positive  definite  .as  to  force  the 
max 

integral  of  Q  to  be  larger  than  the  integral  of  P.  Since  it  is 

difficult  to  estimate  bounds  on  B^  -  (F+GK^)  ,  unless  performing 

the  actual  simulation,  a  sufficient  condition  to  ensure  the  choice 

of  the  centralized  structure  is  F4GK  <  0  or  GK  <  -  F.  As  an 

max  —  max  — 
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example,  if  F  is  negative  definite  and  little  positive 

used,  then  the  centralized  structure  is  chosen. 

*  * 

When  I  (K^,T^)  is  compeired  to  J  (K^,T^)  then  = 

thus  it  is  more  difficult  for  and  to  be  negative 
6.  COMPARING  J(K2,T2)  with  I(K2,T2) 


feedback  is 

P .  and  AP . =0 ; 
1  1 

definite . 


This  is  the  dual  case  to  the  one  developed  previously.  The 
actual  covariance  is  P,  where 

P  =  (F+GK^)?  +  PCF+GK^)"^  -  Ph'^s'^HP  -  Ph'^s“^HP 


T  -1 

PH  S  HP 


+  GT^G 


(208) 


The  tcurget  assumes  a  decentralized  structure,  computing  K^,  T^ 
and  a  covariance  Q  according  to  equations  (180)  to  (192)  when  the 
tracking  team  computes  its  control  L  =  PR  according  to  the  central¬ 
ized  structure  described  by  equations  (159) ,  (160) ,  (172)  to  (179)  . 


Q  is  compared  to  P,  using  a  similar  derivation;  i.e.. 


o  = 


(F+GK2)Q  +  Q(F+GK2)'^  -  Gh'^s'^HQ  +  GT2g'^  -  M3^B^P^2^2 


(209) 


-  <”2®2''l2“y  ’ 


where 


B^  =  (F+GK2) 
B2  =  (F+GK2) 
AP  =  P  -  P 


^1^1  ■  ”2^”l^^l2^1  ■ 


A  sufficient  condition  for  the  decentralized  structure  to  be 


chosen  is  that  and  be  positive  definite.  Confronting  (210) 
and  (207) ,  the  term  involving  AP  plays,  in  both  cases,  a  favorable 

1c 

role  in  forcing  a  definite  conclusion,  when,  if  J  (K2,T2)  and 

* 

I  (K2,T2)  are  computed,  then  AP  =  0  and  that  factor  disappears. 
Again,  it  is  difficult  to  evaluate  (210) .  The  decentralized  struc¬ 
ture  is  adopted,  for  example,  when  F  and  GK  .  are  positive  definite 

mm 

matrices  of  fairly  large  norm. 

As  a  last  remcUfk,  at  t  =  t^,  T  =  0.  Thus,  for  games  of  short 

total  duration  t^  -  t  ,  no  switch  in  T  can  occur;  then  P, _  =  0  and, 
f  o  12  ' 

consequently,  both  structures  are  identical.  Otherwise,  depending 
on  the  very  game  studied,  in  particular  the  target  capabilities 
eind  the  initial  state,  either  structure  might  be  chosen. 

7 .  A  HODITIED  SCALAR  CASE 

A  simplified  scalar,  and  slightly  modified  example  is  developed 
here  to  illustrate  further  some  particularities  of  team  games. 

To  focus  on  the  sole  study  of  the  effect  of  the  mixed  strategy, 
the  feedback  control  K  is  forced  to  zero;  also  F  =  0,  G  =  1, 

=  1,  and  T  is  constrained  to  [0,1],  where  the  maximum  toler¬ 
able  self-added  noise  in  the  target's  dynamics  is  1. 

Then,  the  state  and  observation  equations  for  the  centralized 


case  are 


The  two  measurements  aind  are  combined  into  the  equivalent 


measurement  z: 


z  =  X  +  s. 


where  the  equivalent  noise  s  has  covariance  S  =  ^ 

covariances  associated  with  x  and  the  error  estimate  propagate  as 


X  =  T  , 


P  =  -  2LP  +  L  S  +  T  . 


The  performance  index  and  Hamiltonian  are 


J(T  )  =  X(t  )  -  0.5  /  ^  P(t)  dt  , 

II  tg 


H  =  -0.5P  +  At  +  A  (-2LP  +  L  S  +  T,  )  , 
X  1  p  1 


and  the  necessary  conditions  of  optimality  yield  the  optimal  control 
variable  L  as 


^  =  A  (-2P  +  2SL)  =0,  or  L  =  P/S 

dll  p 


Substitution  back  into  (213)  yields 


P  =  -P  /S  +  Tj^  , 


when  the  costate  variables  propagate  as 


■'x  -  O'  /‘'f'  ■  1  ' 


A  =  PA  +  0.5,  A  (t.) 


The  Pontryagin  maximum  principle  is  applied  to  find  the  optimal 
control  as 


A  +  A  <0-»-T  =  l, 

X  p 

(218) 


A  +  A  >0->-T  =  0, 

X  p 


Singularities  play  an  important  role  whenever  bang-bang  pol¬ 
icies  are  present  for  differential  games.  A  reference  on  the  sub¬ 
ject  is  provided  by  Forhouar  and  Leondes  [32] . 

A  singularity  of  order  n  at  the  switching  point  arises  when¬ 
ever  the  generalized  Legendre-Clebsch  condition  is  met,  i.e,, 


f 


3^ 

3H_ 

3t2 

3t 

0 


(219) 


Furthermore,  stationarity  along  the  singular  arc  implies  that 


3t  3t 


=  0  , 


(220) 


and 


3tM3^ 


0  . 


(221) 


Together  with  the  switching  condition,  this  yields  three  equations  to 

fix  the  three  unknowns  P,  A  and  T.  Then,  the  equations  are  solved 

P 

forward  in  time  and  backwards  in  time  on  either  side  of  the  singular 
arc,  yielding  the  solution. 


The  decentralized  structure  has  the  performance  index 


A  possibility  for  a  singular  arc  arises  again  whenever  the  switching 
function  is  equal  to  zero.  Stationarity  along  the  singular  arc  implies 
that  the  first  two  time  derivatives  of  the  switching  function  be  zero. 
These  three  equations  cannot  fix  the  five  unknowns,  namely  P^,  p^, 

Api,  Ap2  In  the  five-dimensional  space  spanned  by  the 

unknowns,  a  manifold  is  defined  of  two  dimensions,  constrained  fur¬ 
ther  by  the  restrictions  on  time,  t  e[t  ,t,]  and  on  T„  e[0,l]. 

or  z 

Reaching  such  a  manifold  happens  under  rarely  met  initial  conditions, 
as  the  simulation  proved.  The  conclusion  is  that  the  singularity 
does  not  play  a  major  role  in  the  decentralized  structure,  unlike  the 
centralized  structure. 

To  compare  both  structures,  the  differential  equation  govern¬ 
ing  the  propagation  of  the  error  covariance  of  the  decentralized 
structure  can  be  computed  as 

Q  =  -qVs  +  -  2T2Pj^P2/  ^^1 

and  is  compared  with  (213) . 

If  the  target  is  passive,  then  T2  =  0  and  both  structures  are 

equivalent.  Otherwise,  for  T2  =  T^^,  P{t)  ^Q(t)  due  to  the  fact 

2 

that  the  term  ^  bounded  by  0  and  0.25.  Thus, 

*  * 

I  (T)  >_  J  (T)  for  any  given  policy  T,  and,  as  shown  previously,  it 
implies  that  the  decentralized  structure  is  to  be  chosen. 

Simulation  was  performed  for  noise  levels  =  8  and  $2  =  4. 
Figures  28  and  29  show  the  optimal  solutions  corresponding  to  two 
different  initial  conditions.  The  resulting  control  policies  differ 
widely  but  a  higher  covariance  in  the  centralized  case  is  compensated 


J(T3^)  =  0.223  (Singular  arc) 
I{T2)  =  0.385  (T2=0  uniformly) 


J(Ti)  =  0.731  (One  Switch) 
KT^)  =  0.843  (One  Switch) 


Figure  29.  Optimal  trajectories 
for  case  2. 


Figure  28.  Optimal  trajectories, 
for  case  1. 


to  some  extent  by  an  increase  in  X(t^)  .  As  expected,  KK^fT^)  ^ 
J(Ki,Ti)  is  verified. 

More  cooperation,  resulting  in  a  larger  difference  in  J(K^,T^) 
and  KK^,!^)  would  have  happened  if  both  measurements  had  identical 
noise  levels  as  previous  results  by  Mohler,  Kolodziej  and  Bugnon 
[22]  show. 

Thus  far,  the  target  and  the  ships  were  assumed  to  play  accord¬ 
ing  to  either  the  centralized  scheme  or  the  decentralized  scheme. 
When  the  choice  of  the  opponent  is  unknown,  both  pcurties  must  choose 
a  structure  cuid  play  accordingly.  As  explained  earlier,  in  that 
non-perfect  information  case,  the  performance  indices  differ  and  the 
game  is  of  the  non-zero  sum  type.  It  defines  a  bimatrix  game  which, 
for  the  case  of  Figure  28,  is  given  by  Table  5. 


Table  5.  Performances  for  non-perfect  information  game. 


Target  strategies 
a  b 


Target  strategies 
a  b 


Ships  a 

0.94 

0.96 

Ships  a 

0.94 

1.04 

strategies  b 

0.84 

0.73 

strategies  b 

1 

!  0.76 

1 _ 

0.73 

Ships  perfomieuice 


Target  performance 


A  Nash  equilibrium  in  pure  strategies  exists  for  this  example. 
It  is  unique  and  corresponds  to  the  assumption  of  a  decentralized 
structure  made  by  both  parties. 

Actually,  the  open-loop  nature  of  the  structures  depicted  in 
Figures  26  cuid  27  allows  an  implementation  of  both  strategies  in 


parallel  by  the  tracking  team,  a  simple  device  could  compare  the 
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performance  indices  obtained  from  both  schemes  in  order  to  select 
the  best  choice. 

Viewed  from  such  a  perspective,  the  above  bimatrix  game  becomes 
a  Stackelberg  or  leader-follower  game  in  which  the  target  (the 
leader)  announces  its  strategy  first,  and  the  ships  (the  followers) 
react  accordingly.  The  Stackelberg  equilibrium  and  the  Nash  equilib¬ 
rium  coincide  for  all  five  examples  rxm  in  simulation.  The  conclusion 
is  that  both  the  target  and  the  ships  should  play  according  to  the 
decentralized  structure,  even  in  the  event  of  non-perfect  informa¬ 
tion. 

a.  CONCLUSION 

A  stochastic  team  differential  game  is  presented,  in  which  the 
•  target,  by  the  use  of  a  mixed  strategy,  has  a  direct  control  over 

the  cross  correlation  between  the  members  of  the  tracking  team.  The 
study  is  conducted  in  the  general  case  and  two  important  features  of 
team  games  are  demonstrated;  they  are  as  follows: 

i)  Depending  on  the  kind  of  cooperation  allowed  between  the  team 
members,  various  games  can  be  defined.  It  ranges  from  independent 
players  up  to  totally  collaborating  members.  In  the  latter  case, 
certain  choices  must  be  made  in  the  game  structure  which  are  not 
readily  apparent  in  the  l-vs.-l  case. 

ii)  Hierarchical  structures  naturally  arise  with  the  classical 
trade  off  between  performance  and  computational  burden. 

Two  particular  game  structures,  i.e.,  a  centralized  and  a  de- 
(  centralized  one,  are  compared,  yielding  a  matrix-game  study  in  a 


non-perfect  information  frcime.  The  choice  of  structure  can  be 


e^^ressed  as  a  (hier2urchical)  Stackelberg  geune  due  to  the  open-loop 
nature  of  the  problem.  It  is  remark2d3le  that  the  S2une  problem  can 
be  studied  as  a  zero-sum  matrix  game,  a  non-zero  sum  bimatrix  game 
emd  a  Stackelbsxg  game. 

Ccoiplexity  and  dimensionality  are  major  issues  in  team  games. 

For  example,  to  use  the  same  approach  as  in  this  chapter,  there  are 
15  ways  to  combine  the  measurements  of  a  4-vs.-l  team  geune,  resulting 
in  a  15  by  15  matrix  game! 

Both  structures  yield  equivalent  results  whenever  the  target 
does  not  use  mixed  strategy,  or  is  passive.  Otherwise,  siifficient 
conditions,  that  depend  on  the  game  dynamics,  are  derived  under 
which  one  or  the  other  structure  is  to  be  chosen. 

The  centralized  structure  might  be  viewed  as  a  convenient  way 
to  alleviate  difficulties  by  rejecting  the  game  study  up  to  a  unique 
higher  player  in  the  hierarchy.  Nevertheless,  the  advantages  of 
the  decentralized  structure  are  numerous.  Among  others,  the  structure 
is  more  practical,  by  distributing  the  computational  burden  among 
the  players.  Thus  it  cam  be  adapted  to  several  team  configurations 
or  can  recover  better  from  individual  failures.  Also,  the  example 
shows  that  it  is  a  lot  less  likely  that  a  singularity  will  be  encount¬ 
ered  in  a  team  game. 
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methods  fail  in  the  case  where  coherent  signals  cure  received,  as 
shown  below. 

Eigenstructure  based  methods  have  been  successfully  applied  to 
the  harmonic  retrieval  problems  (Piseurenko  [33]).  Likelihood  methods 
are  used  to  derive  detection  tests  for  the  number  of  sources,  as 
studied  by  several  authors:  Bienvenu  and  Kopp  [34],  Rissanen  [35], 
Wax  and  Kailath  [36] .  On  the  other  hamd,  the  related  problem  of 
estimators  for  the  location  of  the  sources  has  been  studied  by 
Bienvenu  eind  Kopp  [34]  euid  Henderson  [37] . 

These  direction-finding  method ^  based  on  the  eigenstructure 
usually  rely  on  the  hypothesis  that  the  signals  are  not  coherent,  i.e 
not  fully  correlated.  Unfortunately,  coherent  sources  occur  fre¬ 
quently  in  practical  problems,  in  case  of  jeunming  or  in  case  of 
multipath  propagation.  When  this  hypothesis  is  no  longer  valid,  the 
spectral  density  matrix  of  the  sources  becomes  singular.  This  re¬ 
sults  in  eui  inconsistency  in  the  eigenstructure  method:  though  u 
sources  are  detected,  the  associated  direction  vectors  are  not 
proper.  Then,  for  example,  linear  processing  techniques  performed 
by  Henderson  [37]  propagate  the  singularity  of  the  source  spectral 
density  matrix,  and  the  methods  described  fail. 

Shan,  Wax  and  Kailath  [38]  propose  a  method  to  recover  from 
coherent  sources  by  averaging  spectral  densities  con5>uted  from  dif¬ 
ferent  linear  subarrays,  taking  advantage  of  the  regular  spacing  of 
the  sensors.  This  special  smoothing  approach  trades  off,  in  fact, 
half  the  effective  aperture  to  recover  from  the  possible  coherence  of 


the  sources. 


The  nonlinear  approach  developed  below  solves  the  case  where 
all  the  sovirces  are  coherent.  It  can  be  applied  to  any  type  of 
array  and  shows  drastic  in5>rovements  in  terns  of  minimal  aperture 
when  the  number  of  coherent  sources  is  high.  Multiplicative  nonlinear 
signals  are  built  by  convolution  operations  on  the  data  to  provide  a 
sufficient  number  of  Mth-order  direction  vectors  to  solve  the 
problems.  In  general,  the  minimum  number  of  sensors  required  to 
solve  a  particular  problem  is  theoretically  independent  of  the  total 
number  of  sources,  but  rather  depends  on  the  number  of  uncoherent 
sources.  The  nonlinear  method  also  provides  the  coherence  coefficients 
between  the  sources,  unlike  the  method  by  Shan,  Wax  and  Kailath  [38]. 
Formulae  relating  the  number  of  sources  to  the  order  of  the  nonlinear 
method  required  are  also  provided.  A  brief  discussion  of  the 
practical  limitations  of  the  method  concludes  the  chapter. 

2.  PROBLEM  STATEMENT 

n  wideband  sources  s^^  are  impinging  on  a  linear  array  from 

directions  9j^.  The  linear  array  consists  of  d  sensors  located  at 

distances  D.  of  sensor  1,  such  that  0  =  D,  <  ...  <  D, .  Then  the 
1  Id 

signals  received  at  sensor  r^  are 
n 

r.  (t)  =  E  s,  (t  -  D.  sin9,  )  +  n.  (t)  ,  (227) 

1  ,  ,  k  -Ik  1 

k=l  c 

where  c  is  the  speed  of  propagation,  and  the  additive  noises  at  the 
ith  sensor,  n^(t)  are  independent,  identically  distributed, 

Gaussieui  noises.  When  the  noise  field  is  unknown,  techniques  des¬ 
cribed  by  Paulraj  and  Kailath  [39]  can  be  applied. 


For  a  finite  observation  time,  the  vectors  R,  S,  N  are  defined 


R  (CO)  =  [R^((0),...,R^(CO)]  , 


S  (CO)  =  [S-  (CO)  , . ,S  (CO)  ]  , 
1  n 


N  (OJ)  =  [Nj^(co),...,N^(a))]  , 


where  R^(co),  and  (co)  are  the  Fourier  trcinsforms  of  r^(t) 

Sj^(t)  and  n^(t).  Then  the  Fourier  transform  of  (227)  yields 
n 

R.  (00)  =  Z  e  s,  (co)  +  N.  (co)  ,  (229) 

k=l  K  r 

where  sin  9j^.  More  generally, 

c 

R(co)  =  A^(co)  S(co)  +  N(a))  ,  (230) 


and  matrix  A^(co)  is  the  (first-order)  direction  matrix 


1  ....  1 

-jCOT.  .....  -jWT. 
e  11  e  In 

A^(OJ)  = 


-JCOT,,  -jWT, 

e  dl. . .  .e  •’  dn 


whose  columns  are  the  direction  vectors  of  sources  s. 


dj(co)  =  [1,  e"^‘^^lJ',...,  e"^‘^’^nk]  . 


Multiplying  (230)  by  its  conjugate  transpose,  and  taking 


expectations,  for  a  sufficiently  long  observation  time,  the  result 


converges  in  the  mean  to  the  spectral  density  matrix  with  the  usual 


coveuriance  problems  associated  with  the  periodogram  estimate  for 
finite-length  san5)les.  Now  this  estimate  of  the  (first-order) 
spectral  density  (tJ)  is  given  by 

L^(a))  =  A^(a))  E(S(a))s’(cj))  +  E (N(a)) N(a)) )  ,  (233) 

where  the  overbar  denotes  the  conjugate  transpose.  Or,  dropping  the 
argiiments  for  short, 

Li  =  +  a^I  ,  (234) 

with 

=  E(S((jJ)s‘((jJ) )  (235) 

2 

cuid  0  is  the  Gaussian  noise  spectral  density  coefficient. 

Equation  (234)  has  the  desired  structure  to  apply  the  eigen- 
structure  method.  If  P^^  has  rank  n,  i.e.,  if  no  two  sources  are 

coherent,  the  d  by  d  spectral  density  matrix  has  n  eigenvalues 

2  2 
larger  them  a  and  d  -  n  identical  eigenvalues  equal  to  O  .  The 

associated  eigenvectors  are  orthogonal  to  the  columns  of  the  direction 
matrix  A^^,  i.e.,  to  the  direction  vectors. 

If  two  sources  are  coherent,  e.g.,  s^  =  cts^,  then  P^  has  rank 
n  -  1  emd  the  d  -  n  -t-  1  eigenvectors  are  not  orthogonal  to  the  dir¬ 
ection  vectors  as  defined  in  (232)  but  are  orthogonal  to  the 
d  by  (n  -  1)  direction  matrix 
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with  a  coii5)ound  direction  vector  in  the  first  colxjmn,  as  a  function 
of  three  unknowns,  namely  0^,  and  a.  has  one  less  column  com¬ 
pared  to  (231) . 

If  there  are  only  d  =  n  +  1  sensors,  then  there  are  only  two 
2 

eigenvalues  equal  to  0  with  eigenvectors  that  are  orthogonal  to  the 
compound  direction  vector.  This  provides  two  equations  that  cannot 
fix  the  three  unknowns  and  the  problem  cannot  be  solved  using  this 
method,  as  pointed  out  in  [38]. 


3.  THE  EIGENSTRUCTURE  GENERATING  FUNCTION 


Mth-order  signals,  from  r.  ,  (t)  =  r.  (t)  down  to  r.  .,(t)  are 

1,1  1  i,M 

constructed  from  the  definition 


r,  ^(t)  =  r  .  (t) 

1,M  l,M-i 


r^(t) 


(237) 


where  *  denotes  the  convolution  operator.  Then,  the  corresponding 
Mth-order  vector  Rj^  is 


[r: 


1'  2' 


(238) 


and  it  follows  that  the  Mth-order  spectral  density  matrix  is  defined 
as 


Lj^(a))  =  E[Rj^(a))  R^(a))]  ,  (239) 

_  M  —  M 

(R^R^)  (Rj^R2)  ...  . 

^^2^1^ ^^2^2^'"'  *  ■  ■ 


(240) 


Introducing  a  new  matrix  operator  A,  is  defined  as  the 
integer  Mth  power  of  R, 


Rj^(w)  =  R(u)) 


Am 


in  terms  of  A  products.  E.G.  when  M  =  2,  each  matrix  element  is 
simply  squared. 

A  performs  a  component  to  component  mulitplication,  i.e. 


A  A  B  =  -^  c.  .  =  a.  .  •  b.  .  ,  (: 

11  11  11 

where  it  is  implicit  that  matrices  A,  B  and,  of  course,  C  are  of 
identical  dimensions. 

Applied  to  p  by  q  matrices,  useful  identities  are 


A  A  B  =  B  A  A  , 

A  A  (B  A  C)  =  (A  A  B)  A  C  , 

A  A  B  =  A  A  B  ,  (: 

(A  +  B)  AC  =  AAC+BAC, 


I  A  1  =  I  , 


cuid  for  n  by  1  vectors. 


(A  A  B)  C  A  D)  =  (AC)  A  (BD)  =  (AD)  A  (BC)  , 


(AB) 


AM 


,AMrAM 
A  B  , 


and,  by  definition,  it  is  assumed  that 


(AD)  A  (BC)  =  AD  A  BC  - 


(245) 


Applied  to  con?)lex  scalars,  A  is  the  regular  scalar  multiplication, 

i.e. 


tAn 


(246) 


The  A-exponential  of  matrix  A  takes  a  form  analogous  to  the 
classical  matrix  exponential: 

00 

eA(A)  =  E  a'^/m:  ,  (247) 

M=0 

where  the  regular  matrix  product  has  been  replaced  by  the  A-product. 

Let  an  Mth-order  spectral-density  generating  function  be  given 
by 

«>Aj^(t)  =  E[eA(Rit)]  .  (248) 


Then,  using  definition  (248)  and  properties  (244),  (248)  yields 


‘^Arr^^’  E(  (RR^^)t'^/M:  , 


(249) 


where 


—  Am 

=  E((RR)  ) 
M 


(250) 


is  immediately  identified.  Thus 


"  m£o  ' 


(251) 


and  hence  the  name  of  Mth-order  spectral-density  generating  function 


As  an  example,  L  is  computed  as 


=  E((RR)  )  =  E((A^SSA^  +  A^SN  +  NSA^  +  NN)  )  ,  (252) 


Then,  using  the  independence  of  the  noises  with  respect  to  the  sources 
for  Gaussian  noises,  odd  powers  of  N  and  N  are  averaged  out.  More¬ 
over,  terms  as  E(N  A  N)  and  E(N  A  N)  go  to  zero  since 


E(N^An^)  =  T  (E(n^(t)  *  n^(t))  , 


(253) 


where  y  is  the  Fourier  transform  operator.  E(n^(t)  *  n^ (t) )  is  the 


correlation  estimate  between  n^(t)  and  n^(-t)  and  is  expected  to  be 


zero  since,  for  any  time  shift  T,  at  most  one  point  is  correlated. 


Thus  is  equal  to 


.2  —  A2 

L2  =  B2  +  4  Bj^Aa  I  +  E((NN)  )  , 


(254) 


where 


—  Am 

Bm  =  E((AiSSAi)  )  , 


(255) 


and,  for  Gaussian  noises. 


M 

Ml  2 


(256) 


More  generally,  it  can  be  shown  that 


=  Z  B  (m)^  AE((NN)'^^”"^^  , 


p=0  ^ 


(257) 


with  B  =  I,  and  is  the  binomial  coefficient,  or 


h, 


”  p=o  P  (M-p):^p:^2”"-° 


(258) 


Let  Kj^  be  the  Mth-order  eigenstructure  matrix  defined  as 


=  B„  + 


(259) 


Then  can  be  computed  from  by  the  recurrent  relation 


“  "-I 


K  =  L  -  r  K  a2(M-p)Ml^2(M_y))!._  .  ^  ^  ^ 

^  ^  p=l  ^  (M-p)! 


p) :  p! 


(26( 


_2(M-P)  Ml^(2(M-p)):  _  _2M  (2M) !  ,  .  _2M  . 

^  ^  - o  o  M  i  I  -  a  - ;—  I  +  a  I 

P=i  (M-p)  1  \ :  ^2^"^  m:  2^ 


2  . 


Kj^  cuid  differ  only  by  their  diagonal  components  a  is  estimated 


as  the  smallest  eigenvalue  of  L 


Then,  the  eigenstructure  generating  function  is  defined  as 


.M 


$  _(t)  =  Z  K  t  /I4  =  E(en(RRt)) 
CPR  M=0 


(26; 


or 

=  E((BR)°M  )  ^  (26: 

where  the  p  -operator  is  defined  a  posteriori  from  relations  (260) 
and  (262)  . 

When  all  the  sources  are  coherent,  is  composed  of  a  unique 
compound  vector  and 


where 


(26. 


is  the  Mth-order  compound  vector  and 


is  now  a  scalar  (dimension  1) . 

Then  the  d  by  d  Mth-order  eigenstructure  matrices  K  have  the 
prescribed  structure 

“  Wm  +  I  .  (266) 

2M 

Thus  1  of  the  d  eigenvalues  is  larger  than  0  ,  the  remaining  d  -  1 

2M 

eigenvalues  are  equal  to  0  and  the  associated  eigenvectors  are 
orthogonal  to  the  columns  of  matrices  A..,  i.e.  to  the  Mth-order 
direction  vectors,  thereby  yielding,  for  each  other  M  =  1,  2,... 
amother  set  of  equations  which  are  functions  of  the  unknown  para¬ 
meters. 

Moreover,  the  Mth-order  direction  vector  has  for  components  the 

components  of  the  corresponding  first-order  direction  vector 

raised  to  power  M.  Thus  they  are,  in  general,  independent  vectors 

cuid  functions  of  the  very  same  parameters. 

The  Mth-order  eigenstructure  method  is  applied  to  matrices  K^, 

K2,...,Kj^.  Consequently  there  are  no  theoretical  limitations  to  M, 

nor  to  the  number  of  equations  provided  by  the  method.  The  only 

constraint  is  that  there  must  be  at  least  one  eigenvalue  equal  to 
2M 

0  ,  and  thus  d  -  n  ^  1  is  enforced,  i.e.  d  ^  2  where  n  =  1  is  now 

the  order  of  the  source  spectral -density  matrix,  which  is  the  number 
of  non-coherent  sources,  1  in  the  case  where  all  sources  are  coherent. 

4.  MINIMUM  SENSOR  CONFIGURATIONS 


The  incident  source  signals  impinging  on  the  array  are  classified 


by  (n:u,c,k)  where 


n  is  the  total  number  of  sources, 
u  is  the  number  of  non-coherent  sources, 
c  is  the  number  of  coherent  sources, 

k  is  the  maximum  number  of  coherent  sources  present  in 

uncorrelated  groups  of  sources,  and  k  is  smaller  than 

n. 

Then  it  can  be  shown  that 
n  =  u  +  c  , 

k  +  1  e[  L(n/u),  c  +  1]  for  u  >  1  ,  (267) 

k  +  1  =  c  +  1  for  u  =  1  , 

and  L  is  such  that  L(n/u)  is  the  greatest  integer  smaller  or  equal 
to  n/u. 

For  an  array  of  d  sensors,  receiving  signals  from  n  directions, 
the  classical,  i.e.  first-order,  method  is  applied  first.  If  coherent 
sources  are  present,  the  u  non-coherent  soxirces  are  detected  and  only 
q  ^  u  orthogonal  proper  direction  vectors  are  computed.  Then  there 
are  n  -  u  =  c  coherent  sources  that  are  a  function  of  the  u  -  q  non¬ 
coherent  ones.  It  must  be  emphasized  that,  at  this  point,  c  is  to 
be  guessed  by  the  experimenter.  Nevertheless,  a  large  value  for  c 
solves  all  problems  where  c  is  smaller  but  at  the  expense  of  more 
computation.  A  trial  and  error  procedure,  choosing  first  c  =  1 
euid  increasing  to  c  =  2  if  no  match  is  found  is  also  possible. 

The  nonlineeur  method  if  applied  when  all  the  sources  are  coherent. 
In  that  case  u  =  1  and  k  =  n  -  1.  The  sources  are 


s. (t)  =  a,  s.  (t) 


(268) 


.48 


i  =  l,  2,-..,n,  are  complex  constants,  =  1.  The  compound 
direction  vector  is 


=  d^  =  Z  a^d*  ,  (269) 

i=l 

-►* 

where  d^  are  the  uncompounded  direction  vectors  as  defined  in  equation 
(231)  .  dj^  is  a  function  of  2n  -  1  unknowns:  n  -  1  coherence  co¬ 
efficients  and  n  direction  parameters  0^. 

In  the  minimum  configuration  case,  there  is  only  one  eigenvalue 
2M 

equal  to  O  and  thus,  each  order  produces  one  more  equation  from 

the  orthogonality  of  the  corresponding  eigenvector  with  the  com- 
~^Am 

pound  eigenvector  d^ 

Consequently,  the  minimum  configuration  to  solve  the  problem 
is 


d  =  u  +  1  =  2 

M  =  2n  -  1  . 


(270) 


When  d  =  n  +  1,  then  there  are  at  least  n  minimum  eigenvalues 
equal  to  a  as  a  consequence,  n  new  equations  are  produced  from 
each  order.  Thus  the  second-order  nonlinear  method  produces  2n 
equations  that  solve  the  problem  in  its  entirety. 

More  generally  if  d^  is  the  miniraim  number  of  sensors  required 
by  the  method  described  in  [38],  d^  by  the  second-order  nonlinear 
method  and  d^^  by  the  Mth-order  nonlinear  method,  then 


d 


=n+k+l=2n 


Figure  30  compares  minimvnn  sensor  configurations  that  allow  to 
solve  the  problem  for  these  three  methods.  Gains  in  terms  of  sensors 
are  substantial  for  high  number  of  coherent  sources.  The  required 
value  of  M  in  the  Mth-order  method  is  indicated  in  parenthesis. 


Figure  30.  Minimum  sensor  configuration  for  n  coherent  sources. 


5.  CONCLUSION 


The  Mth-order  nonlinear  method  yields,  theoretically,  the  receiv¬ 
ing  angles  and  the  coherence  coefficients  in  the  case  of  coherent 
sources.  For  a  constant  number  of  sensors,  an  increased  number  of 
coherent  sources  is  usually  followed  by  an  increase  in  the  order  of 
the  non-linear  method  requested  to  solve  the  problem. 

Practically,  for  a  finite  observation  time  T,  the  Mth-order 
signals  have  lengths  M.T,  but  even  though  they  are  longer,  they  do 
not  converge  any  better  than  the  first-order  signals  to  the  Mth- 


order  spectral-density  matrices.  Actually,  for  a  short  observation 


time,  the  repeated  convolutions  have  the  drawback  of  enhamcing  the 
irregularities  in  the  noise.  This  clearly  sets  a  limitation  on  the 
practical  order  of  the  method,  for  a  given  observation  time,  added 
to  the  computational  requirements  to  derive  the  Mth-order  matrices. 

Nevertheless,  it  must  be  underlined  that,  for  each ’order,  the 
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smallest  eigenvalue  is  a  ;  which  provides  a  convenient  way  of 
checking  degree  of  convergence  for  the  corresponding  estimated  Mth- 
order  matrix  coit^ared  to  the  first  order. 

The  most  serious  limitation  of  the  method  is  that,  so  far,  it 
can  only  be  applied  to  the  particular  case  where  all  the  sources  are 


coherent. 


X.  CONCLUSIONS 


Team  differential  games  feature  several  particular  problems. 
Classical  N-player  games  eu:e,  generally,  N  point  boundary  value 
problems  but  an  (N-1) -pvursuer-l  evader  team  differential  game 
includes  N-2  extra  lanknowns  due  to  the  fact  that  optimality  applies 
to  the  team  as  a  unit  and  not  to  individuals.  For  minimum- time 
problems,  these  unknovms  are  modeled  as  strategic  variables  whose 
values  queuitify  the  activity  of  the  pursuers.  Singularities  that 
play  a  major  role  in  the  solution  of  a  two-player  differential  game 
do  not  appear  nearly  as  important  for  team  games.  On  the  other 
hand,  the  study  of  the  game  of  kind  is  a  lot  more  complex  and  crucial 

Eaurly  choices  on  the  kind  of  cooperation  allowed  between  team¬ 
mates  are  reflected  in  the  form  of  the  performance  index,  simpler 
choices  yield  formulae  easily  generalized  at  the  expense  of  a  re¬ 
duced  interaction  level  in  the  team,  hampering  the  result.  More 
powerful  team  structures  can  be  treated  classically  only  by  increas¬ 
ing  the  dimensions  of  the  game.  Through  a  suitable  definition  of 
the  reduced  coordinates,  convenient  studies  of  team  games  can  be 
conducted  but  depend  on  the  type  of  emalysis  conducted  for  the  very 
type  of  game. 

Due  to  its  relative  simplicity,  the  one-versus-one  game  can 
often  be  studied  quite  extensively.  In  most  instauices,  the  addition 
of  an  extra  pursuer  to  a  team  is  not  directly  reflected  on  the  form 
of  the  controls  of  the  pursuers  as  it  is  on  the  control  of  the  evader 


Moreover,  useful  time  and  performance  limitations  to  the  team  game 
can  be  derived  from  the  1-versus-l  game.  Then,  it  allows  a  study  of 
the  teeun  game  from  geometrical  analogies  or  the  computation  of  approxi 
mate  solutions  which  cure  tailored  to  a  given  game  as  the  composition 
approximation  or  the  simple  sviboptimal  solution  to  the  linear  quad¬ 
ratic  team  geune.  The  same  approach  can  even  yield  the  exact  solution 
to  the  otherwise  untractable  problem  of  optimal  location  of  a  pursuer 
in  a  team.  Maximum  team  controllability  criteria  though  generally 
non-game  optimal,  still  provide  with  hcindy  relationships  to  approxi¬ 
mate  terminal  time  state  unknowns. 

Because  of  the  con^plexity  of  team  differential  games,  structures 
and  hierarchies  arise  naturally.  Analysis  or  computational  burden 
is  the  reason  for  introducing  structures  that  breedc  the  solution 
into  easier  steps,  as  in  the  games  involving  a  minimum  operator  in 
the  perfojntuuice  index  that  show  two  distinct  phases  or  as  the  com¬ 
position  approximation.  When  the  hierarchy  that  corresponds  to  the 
complete  solution  is  prohibitively  complex,  a  careful  sensitivity 
study  can  yield  suboptimal  hierarchies  to  reduce  both  the  informa¬ 
tion  structure  and  the  computations  required. 

The  stochastic  team  differential  game  investigated  shows  that 
hierarchical  choices  must  be  made  beforehand  that  reflect  a  team 
philosophy  or  an  early  strategic  option.  Decentralized  structures 
are  probably  more  adequately  adapted  to  team  games  but  optimality  is 
usually  not  achieved. 

The  nonlinear  approach  to  the  direction  finding  problem  addresses 
the  problems  of  jamming  and  multipath  propagation.  Although 


computationally  costly,  it  shows  great  improvements  when  the  coher 
ence  level  of  the  received  signals  is  high.  For  a  given  nimdaer  of 
sensors,  it  enables  the  solution  of  a  wider  class  of  problems,  but 
the  observation  time  sets  a  limitation  on  the  order  of  the  method, 
thereby  limiting  the  possibilities. 
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